The Problem of Induction – Andrew Eifler

Over the last few weeks I have enjoyed reading Nassim Taleb’s Fooled by Randomness. The core premise of the book is that randomness plays a larger role in our lives than we think – and if we’re not careful we can easily be fooled into mistaking purely random events as something other than random.

In one example Taleb explains that basing mutual fund investing decisions on the historical performance of the fund manager can be very dangerous. For example, in a scenario where 8,000 fund managers each have a 10% chance (by pure luck) of making $10MM in one year – chances are, based solely on chance, eight of those fund managers will make $10MM for three years in a row.

In this scenario if you invested with one of the eight fund managers who had made $10MM for three years straight – the following year the chance of that manager repeating their performance will be the same as every other fund manager: 1/10.

Of all the concepts that Taleb covers, my favorite is a short section about the “problem” of induction.

Fundamentally, induction is the process of taking lots of little bits of data and gluing them together into a cohesive story. The main reason why induction exists is because as humans we are limited in the amount of disparate data points we can remember. The average person’s short-term memory tops out at about seven pieces of information and, without context, there is a very small chance that any of those seven bits will be remembered for longer than a few minutes. Induction is the magical process that allows us to remember bits of information for longer – but requires us to weave our information into a story. The problem comes, of course, after we form the story and the underlying bits of information are forgotten.

As a demonstration of this problem, imagine you are conducting a lab study on the benefits of a new vitamin. You work with 10 human subjects all of whom report feeling better and livelier after taking the vitamin. From your data (the accounts of your 10 participants) you can induce that the new vitamin is a dietary supplement that will make everyone feel better and livelier. What is lost here, after you form your story, are the details about your 10 participants. Maybe one of your participants reported feeling great, but his hair started to fall out – or maybe your group of participants was not an accurate representation of the general public (maybe they were all men over the age of 60).

In a simpler and more prescient example, say you observe over the course of the past 25 years the housing market has gone up year over year. Based on your data, you induce that the housing market always goes up. What’s lost here is the fact that this story is based on a limited and specific pool of data.

All too often some important details get left out between the “data collection” phase and the “story” phase. That is why I have a strict policy about trusting research studies: I trust every study that I read, except for the ones that I did not conduct myself.

The Problem of Induction