Over the last few weeks I have enjoyed reading Nassim Taleb’s Fooled by Randomness.  The core premise of the book is that randomness plays a larger role in our lives than we think – and if we’re not careful we can easily be fooled into mistaking purely random events as something other than random.

In one example Taleb explains that basing mutual fund investing decisions on the historical performance of the fund manager can be very dangerous.  For example, in a scenario where 8,000 fund managers each have a 10% chance (by pure luck) of making $10MM in one year – chances are, based solely on chance, eight of those fund managers will make $10MM for three years in a row.

In this scenario if you invested with one of the eight fund managers who had made $10MM for three years straight – the following year the chance of that manager repeating their performance will be the same as every other fund manager: 1/10.

Of all the concepts that Taleb covers, my favorite is a short section about the “problem” of induction.

Fundamentally, induction is the process of taking lots of little bits of data and gluing them together into a cohesive story.  The main reason why induction exists is because as humans we are limited in the amount of disparate data points we can remember.  The average person’s short-term memory tops out at about seven pieces of information and, without context, there is a very small chance that any of those seven bits will be remembered for longer than a few minutes.  Induction is the magical process that allows us to remember bits of information for longer – but requires us to weave our information into a story.  The problem comes, of course, after we form the story and the underlying bits of information are forgotten.

As a demonstration of this problem, imagine you are conducting a lab study on the benefits of a new vitamin.  You work with 10 human subjects all of whom report feeling better and livelier after taking the vitamin.  From your data (the accounts of your 10 participants) you can induce that the new vitamin is a dietary supplement that will make everyone feel better and livelier.  What is lost here, after you form your story, are the details about your 10 participants.  Maybe one of your participants reported feeling great, but his hair started to fall out – or maybe your group of participants was not an accurate representation of the general public (maybe they were all men over the age of 60).

In a simpler and more prescient example, say you observe over the course of the past 25 years the housing market has gone up year over year.  Based on your data, you induce that the housing market always goes up.  What’s lost here is the fact that this story is based on a limited and specific pool of data.

All too often some important details get left out between the “data collection” phase and the “story” phase. That is why I have a strict policy about trusting research studies:  I trust every study that I read, except for the ones that I did not conduct myself.

The Problem of Induction
Tagged on:         
  • I found Fooled by Randomness cute, and the Black Swan somewhat cuter, but the issues have been raised and dealth with (better, imo) in the literature for the philosophy of science.

    You may enjoy Kuhn’s Structure of Scientific Revoluions:
    http://www.amazon.com/Structure-Scientific-Revolutions-Thomas-Kuhn/dp/0226458083/ref=sr_1_1?ie=UTF8&qid=1336352047&sr=8-1

    Popper’s Logic of Scientific Discovery (a bit better, rather harder read):
    http://www.amazon.com/Logic-Scientific-Discovery-Routledge-Classics/dp/0415278449/ref=sr_1_2?ie=UTF8&qid=1336352067&sr=8-2

    Perhaps also Lakatos’ Methodology of Scientific Research Programs:
    http://www.amazon.com/Methodology-Scientific-Research-Programmes-Philosophical/dp/0521280311/ref=sr_1_2?s=books&ie=UTF8&qid=1336352097&sr=1-2

    And Laudan’s Progress and its Problems is quite good too:
    http://www.amazon.com/Progress-Its-Problems-Towards-Scientific/dp/0520037219/ref=sr_1_1?s=books&ie=UTF8&qid=1336352194&sr=1-1

    For a more “popular science” (and I think, better) look at randomness, I highly suggest The Drunkard’s Walk: How Randomess Rules Our Lives:

    http://www.amazon.com/The-Drunkards-Walk-Randomness-Vintage/dp/0307275175/ref=sr_1_1?s=books&ie=UTF8&qid=1336352241&sr=1-1

    I admit to a bias against Taleb. I didn’t find his books to be fantastic, and he has a very poor opinion of academic research which I – personally – think is based more in a willful ignorance about the questions being asked than anything else. He doesn’t ask “does this make sense?” he asks “is this useful to me?” but in doing so ignores large swathes of history – and then derides thinkers for being “Mickey Mouse philosophers” because they are not directly useful to him… or philosophers like Hume, for being derivative (when they are anything but) because he can’t see a large difference in their answers vs. the answers of people before him.

    He’s gotten somewhat better recently, as he’s taken an academic bent, but still.

    By the way, your example with a study on new vitamins has more problems than you realize. The data will be analyzed with an ANOVA (and more than 10 participants, thank you). And ANOVA is strictly a test about averages and variances – it tests that two groups of people are significantly different than each other. The best illustration is a graph of different bell curves:

    http://www.uncp.edu/home/frederick/DSC510/anova01.gif

    In other words, all any kind of ANOVA (drug test, whatever) will tell you is the AVERAGE difference between two populations. In many, MANY cases, the bell curves overlap – to a large extent. So an individual from the no-vitamin group may have EXACTLY the same performance as someone from the with-vitamin group – just one is in the upper end of the bell curve and the other in the lower.

    How do you know? Well, you don’t: it’s random.

    Here’s a question: how do doctors know what affect a certain dose of a drug will have? Answer: it’s the average from tests with ANOVAs.

    Doctors have no idea what the exact impact of a drug will by on any individual patient (usually, with a pretty big margin), and have NO WAY of determining what that impact will be, prior to proscribing the drug.

    Fun stuff.

    The “not being representative” bit you have in there is a bit weaker, since studies can – and do – deal with representativeness for participant selection.

    The sad fact is that statistics is the worst way to make decisions, except for all the others.

  • Andrew Eifler

    Awesome – thanks, as always, for the recommendations! I’ve definitely got some reading to do here.

    Andrew