Saturday, November 08, 2014

Understanding "one in a million" in the age of big data

This is a great article for several reasons. My short version of what it says is: if you evaluate one million million-to-one propositions, it is likely enough that one of them will be "true".

First, it reinforces the idea, well-known in many quarters, that it is possible to be a superior stock-market trader. It particularly undercuts the idea of "technical analysis", something I have always doubted and dismissed.

Second, I've always thought this is one of the things that causes people to assign meaning to improbable but coincidental events in daily life. The hours and years of daily life offer so many different opportunities for patterns to emerge, everyone is bound to experience a few that seem remarkable, but are nevertheless entirely coincidental. (You may at this point call me unromantic or bloodless--I prefer the former, but I'll answer to either :) )

Third, although I'm not sure the article explicitly makes this point, it is yet another cautionary tale of the dangers of mixing correlation and causation. The most aesthetically pleasing way to discovery is to first formulate a theory, and then to prove it with data. Next best is to proceed from observational data, to formulate a well-constructed, internally-consistent theory that relies on well-known first principles. Less appealing is to find a correlation in data, and to construct a theory from it, using new principles that may amount to a post-hoc explanation, rather than time-tested principles. Worst of all is to take a statistical observation as law, without any underlying theory at all.


No comments:

Post a Comment