I remember back to my senior year of high school, in a Statistics class, where my teacher posed the question, "How many coin flips are you more likely to end with half heads and half tails? 100 or 100,000?"
On the simple principle of the Law of Large Numbers, the more times the coin is flipped, the more likely the data will converge towards its average.
Now, most students would look at this as an easy question, but I specifically took aim, because I felt the answer the question wanted was different from the answer itself.
At the time, I did not have much experience with proofs, and maybe less with computer programming, but I decided to write a little piece of software to run on my TI-83 to do the following.
1) flip a coin N times (N being an even number) and tell me if it ends with half heads and half tails.
2) Do this many times and count how many times the coin is flipped exactly half to each side.
With this application, I could tell the calculator to flip the coin 100 times, and out of 100 samples of this test, it would say that it ends at exactly 50/50 10% of the time. Then I told the calculator to flip the coin 10,000 times, and out of 100 samples of this test, it said that it ended at exactly 50/50 5% of the time.
So let's step back for a moment. The question is asking with which amount of coin flips, am I more likely to end with half heads and half tails. My teacher suggests that the answer is that the more flips, the more likely we are to end with 50/50, and I argue that it is the exact opposite, and that although we will hover around 50/50, we are less likely to stop on exactly 50/50.
Why do I find this relevant now? 10 years later. It seems that Big Data is all the rage, everyone is talking about how to understand and better yet predict what people are going to buy, what we want to watch, what we want to eat, and guiding large directional decisions based on this data. Kevin Slavin talks about this information shaping our lives, when it used to just be shaped by our lives. Just as it is deeply concerning to him, I am deeply concerned by this as well.
Designers, Developers, Problem solvers often try to help the most people possible with each solution. Programs that try to solve problems in this way might get close more often, but I would argue are less likely to get it right.
This past year I learned of Hester Street Collaborative. This small group of problem solvers have focussed their energy on bringing public policy and urban planning knowledge to Hester Street in the Lower East Side of NYC. Their localized approach has been very successful, and the founder suggests that they don't want to zoom out, it is being local that allows them to do such great work.
When reading Mountains Beyond Mountains, it was Paul Farmer who shows the personal touch necessary for a successful health program and that working with the local system, even if it involves Juju, is necessary for success.
There are hundreds of examples that show this in practice, and I would like to help bring the focus back to localization with an understanding of the global view.
I understand how the law of large numbers can be profitable for a company, but I want to make sure our decisions are based on what is profitable for individuals.
[Linked here] is an updated version of the application I wrote, which demonstrates the likeliness of a perfect 50/50 as the pool gets larger.
In high school I was not concerned with the bigger picture of where this would lead our lives culturally, day to day. Looking at the bigger picture now, I would like to suggest that statisticians, actuaries, data visualists think both about the big picture of their data, and the single pieces as well.
Lev Manovich, who coined the term Cultural Analytics, creates visually explorative pieces, seemingly graphs, but instead of plotting points, he uses the source. Seeing the painting on a graph instead of turning it into a number gives the visualization context and changes how it communicates.
The lie of the Law of Large Numbers is that there are no large numbers, only a whole bunch of small ones added together.
**A final note, today as I was reading through one of Richard Feynman's books on physics, I noticed the term describing this exact issues, probability density.