When less is more (memory limits and correlations)

Here’s a fairly geeky post connecting a fact about statistical distributions to a possible benefit of having limited working memory abilities.

I recently ran across a series of articles by Yaakov Kareev from the mid-late 1990s showing something remarkable: People with less working memory capacity are better able to detect moderately strong correlations (Kareev, 1995; Kareev et al, 1997; Kareev, 2000). Understanding why requires a bit of a digression into statistics. For those of you familiar with statistical distributions, skew, and correlations, just skip to the end.

A brief discussion of correlations: Correlations express the relationship between two variables and can range from -1.0 to 1.0. Imagine that height and weight were perfectly correlated. A perfect positive correlation means that you could line up all the people in the world from tallest to smallest, and the tallest person would also be the heaviest, the next tallest would be the next heaviest, and so on down to the smallest person who would be the lightest of all. Of course, in the real world, correlations are rarely perfect. There are plenty of heavy short people and light tall people. The correlation coefficient (r) gives a way to express the degree to which two variables tend to vary together. The higher the absolute value of the correlation, the better you can predict one variable from the other. If the two variables are completely independent, they will have a correlation of 0, and knowing the value of one variable tells you nothing about the value of the other.

Technical bits about sampling distributions: Because correlations can’t be any bigger than 1.0 (or smaller than -1.0), the distribution of possible correlations is truncated. You can’t do any better than a perfect ability to predict one variable based on values of another. That’s different from a normal or bell-shaped distribution which continues to infinity in either direction.

Figure illustrating normal distribution

normal distribution from Nusha at sl.wikipedia

If a variable is normally distributed, then if you test any person at random, they are equally likely to have a value for that variable that is above or below the average value for the population. If you take a sample of 10 random people and average their scores, the average will be equally likely to fall above or below the population average. Most of the values will fall closer to the average, with extreme values being more rare (hence the peak at the center of the distribution). If you did that repeatedly and then averaged all of the averages, the grand average would be close to the population average. If you took a large enough sample, then the average of that sample would be close to the average for the population as well.

Correlations don’t work that way, though, because their distribution is truncated. If the correlation in the population is r=.60, the distribution will have a big peak higher than .60 and a long tail below .60. It has what is known as a negatively skewed sampling distribution.

figure illustrating skewed distributions

Figure from Kareev et al (1997)

This property of the sampling distribution for correlations means that if you sample 10 people at random from the population and compute the correlation for those 10 people, you are more likely to find a correlation that is bigger than the true correlation in the population than one that is smaller than the true correlation. The smaller the sample, the more likely it is that the correlation in your sample will be bigger than the correlation in the true population value. In the figure from Kareev et al (1997) above, you can see that the shape of the distribution with small samples has most of its mass to the right of the vertical line that shows the true population correlation. Only with large samples does the peak of the distribution shift closer to the true population value.

Okay. Enough of the stats. What does this all mean? It means that if you are only able to take small samples, you are likely to perceive real correlations to be stronger than they actually are. And, it turns out that this distortion might be a good thing. People are built as pattern detectors, and it’s important that we successfully detect relationships that actually exist in the world. It’s important to know whether a particular type of cloud is associated with rain, for example. We can’t readily measure every possible example of an association in the world (we don’t have access to all cases in which that cloud type appeared and whether or not it rained). Instead, pattern perception is driven by anecdotes or collections of anecdotes—we have access to only a small number of examples. And, if there actually is a relationship between two variables in the world, we’ll be more likely to detect it with small samples than with large samples because small samples tend to suggest a stronger association than is actually present in the world.

Here’s where Kareev’s findings are particularly noteworthy: People who have low working memory capacity tend to perceive real correlations to be stronger than do people with high working memory capacity. In essence, people with less memory available can keep fewer examples in mind when checking whether an association exists, and as a result, they are more likely to have an inflated estimate of the actual association. That is, they’re more likely to see the correlation as really strong and are less likely to miss a moderate correlation in the world. Having less working memory available makes you better able to detect the presence of an association when you’re looking for one.

This sort of “less is more” idea has been used to explain the ease with which children can acquire language (e.g., Newport, 1988). It might also help to explain the ease with which people form stereotypes (but only when those stereotypes are actually true). This is a beautiful example of taking a simple, unremarkable fact about statistical distributions and using it to predict something remarkable about how people perceive the world.

Sources cited:

Kareev, Y. (1995). Through a narrow window: working memory capacity and the detection of covariation Cognition, 56 (3), 263-269 DOI: 10.1016/0010-0277(95)92814-G

Kareev, Y., Lieberman, I., & Lev, M. (1997). Through a narrow window: Sample size and the perception of correlation. Journal of Experimental Psychology: General, 126 (3), 278-287

Kareev Y (2000). Seven (indeed, plus or minus two) and the detection of correlations. Psychological review, 107 (2), 397-402 PMID: 10789204

Newport, E. (1988). Constraints on learning and their role in language acquisition: Studies of the acquisition of American sign language Language Sciences, 10 (1), 147-172 DOI: 10.1016/0388-0001(88)90010-1

1 comment to When less is more (memory limits and correlations)