Wednesday, December 2, 2009

Statistical Mistakes in the News

This time, courtesy the Wall Street Journal:

Arguing that this recession is worse than the prior recession because of Simpson's paradox is fallacious because it ignores the fact that a) college education is endogenous to any measure of economic performance and b) there is intergroup mobility.

Basically, it's misleading to say that everyones worse off by looking at simpson's paradox when the actual act of moving to the college education part is a choice designed to (successfully) minimize how worse off you get. Thus, the shift towards more education is endogenous to the system, and
you can't treat it like an exogenous divider of the population and call this recession worse when it's not a fair divider. You need the period in which they're measured to be uncorrelated, and you need the actors to be consistent and exogenously separated. The latter is not the case here. We should care far more about overall employment numbers.

The disaggregated data aren't worthless; disaggregated data are still very accurate at determining trends in what is happening intragroup. It's just that examining lots of disaggregated data cannot tell you about the aggregate here because the method of disaggregation removes a very important variable.

edit: I wrote the author of the article and a number of the people quoted in it. Professor Meng's gracious responses:
Dear Mr. ***,

Thanks for your email. I was asked to provided examples of Simpson's paradox, and the kidney stone example came to me immediately because I used it in one of my classes.
As for the unemployment rate example, I was not consulted for it (indeed I just read the article from a forwarding by a colleague) nor should I have been as I have no expertise in that area other than to confirm that it is a case of Simpson's paradox, statistically. But I certainly agree with you in terms of the general principle, that is, the causal interpretation of any Simpson's paradox requires substantive knowledge, just as a causal interpretation of any association.

With best wishes,
Xiao-Li Meng

and more:
Dear Trevor,

You are welcome. Incidentally, I saw on your blog that
you removed your original question to protect my
identity. Whereas I very much appreciate your being considerate,
I don't mind being identified if you think posting
your original question helps more people to think
deeper, which was the key point I tried to make when
Ms. Tuna called me.

As a statistician, I am also on the lookout for
statistical mistakes in the news (and in other media), so
I appreciate your effort. However, I hope you don't
mind that I point out a "mistake" in your blog title:
the type of potential mistakes you refer to is not
"mathematical mistakes", but rather "statistical mistakes"
or "inference mistakes". If there were any calculation
errors in the article, that would be mathematical mistakes.
What you were arguing is not the errors in the numerical results
(or in any mathematical formula), but rather the potential
errors in interpreting such results, an exercise belongs to
inferential reasoning, not mathematical

An immediate relevant consequence of mixing
"mathematics" with "statistics" is the common mis-perception that
for a given data set, there should be one correct answer,
just as with almost all the mathematical homework we have done.
But as you may understand well, for real-life inference
problems, there can easily be several competing interpretations/answers,
all well articulated but based on different assumptions. Indeed, I am curious how different economists may respond to your question differently, perhaps based on different
assmptions of the degree of endogenous?

Keep up with your good effort,


Done and done. Thanks so much, Professor Meng!


  1. This comment has been removed by a blog administrator.

  2. I removed the letter as a comment in order to preserve the anonymity of the response.