## 17.5.05

STEVE COMMENTS ON STEVE AND STEVE. Book Review No. 13 is Levitt and Dubner's Freakonomics (details and weblog). The book does not meet my standard for a general interest economics book, which is, "Can I find enough good imponderables in it to use it as a supplemental book for a principles class?" Yet another Steve, Landsburg to be specific, wrote Armchair Economist years ago, and I'm still using it as such a book. (My introductory course is probably a bit thin on graphs and formulas, but people leave grasping opportunity costs and incentives, and the best get comparative advantage and arbitrage.) Freakonomics is a quick read; I was able to finish it during exam-proctoring time last week.

I concur with the objections raised by Paul at Electric Commentary and by Gordon at Conglomerate Blog, who raises a serious set of laymanlike objections to the chapter on parental performance, which is structured around a series of properties of parents that lead to better test scores of children.

The basis of that chapter is an empirical investigation of test scores based on the contents of the Early Childhood Longitudinal Study. (I will skip the slide with the title and co-authors' names, and the hard-to-read slide giving the means of 100 variables, and the even-harder-to-read slide with six columns of regression coefficients, a few of which have the proper sign and sufficiently many asterisks by it.) Let me focus on the description of the method, from p. 162.
Regression analysis is the tool that enables an economist to sort out these huge piles of data. It does so by artificially holding constant every variable except the two he wishes to focus on, and then showing how those two co-vary.
Provided a bunch of other technical stuff, some of which you can check, and some of which you must take on faith, is true, and provided the rounding error in the computer doesn't get the better of you. The real devil, however, is in that "holding constant."
In the case of the ECLS data, it might help to think of regression analysis as performing the following task: converting each of those twenty thousand schoolchildren into a sort of circuit board with an identical number of switches. Each switch represents a single category of the child's data: his first-grade math score, his third-grade math score, his first grade reading score, his third-grade reading score, his mother's education level, his father's income, the number of books in his home, the relative affluence of his neighborhood, and so on.
Yes, then there's this other unobservable switch called the "error term," which God, or Nature, or the shade of Karl Gauss throws, and that throw affects only the test score without affecting the affluence of the neighborhood or mother's education. Subject to that stipulation the following paragraph is accurate.
Now a researcher is able to tease some insights from this very complicated set of data. He can line up all the children who share many characteristics -- all the circuit boards that have their switches flipped the same direction -- and then pinpoint the single characteristic they don't share.
Why?
What we really want to do is measure two children who are alike in every way except one - in this case, the number of books in his home - and see if that one factor makes a difference in his school performance.
That's where the role of that unobservable switch begins to matter. Regression analysis does not require the researcher to find two children in all other respects alike with different family libraries. Rather, it provides estimates of the partial effects of neighborhood, mom's education, size of library, earned-run average of the local baseball team (or is that subsumed under "Nature's switch?") on test score. The usual term of art in applied economics is "controlling for" these phenomena, as Leavitt and Dubner explain on p. 164. Popular books don't like to use footnotes, but that's what this parenthetical is.
(To control for a variable is essentially to eliminate its influence, much as one golfer uses a handicap against another. In the case of an academic study such as the ECLS, a researcher might control for any number of disadvantages that one student might carry when measured against the average student.)
That all sounds very scientific, but in practice a researcher often "controls for" something by what I learned as "dummying it out." In Freakonomics speak, perhaps for one kid the switch called "Black" is flipped on, and for another, the switch called "Latino" is flipped on, and for some the switch called "Female" (and mislabelled "gender;" that's technically a "sex" proxy for something else, but the audience is the lay reader) is flipped; then for each kid there is a slide switch counting the books in the family's library. Under that specification, the partial effect of books on scores is the same irrespective of the kid's ancestry or gonads. Sometimes the researcher will set up a more complicated model, in which there is one slide switch for size of library and another slide switch for the size of library in a Black household and another slide switch for the size of library in a Latino household; that's called "interaction" and that becomes hazardous for two reasons. First, additional terms in the regression analysis use up degrees of freedom, which can be fatal to the project if there are more effects to estimate than there are observations to infer from, and which increases the standard error of the estimate, which can be fatal to sign and significance, and that's hard to get even on your six columns of specifications that worked best. Second, all statistical inference using a computer involves approximating rational numbers in base 10 (that's true even with exponential and logarithmic specifications; Mr. Spock had the right way to distract a computer years ago) with integers in binary or some other power of two, and more complicated switchboards such as my multiple-slide-switches create what we call "sparse" matrices with lots of zero values. The effect on the machinery is a combination of rounding problems and conditioning problems. Specification is thus a tradeoff of sufficient richness against economy of computing resources. It's possible to make inferences all the same, but it is not as easy as Levitt and Dubner make it sound. And to compare it to a golf handicap -- which is not that easy to work out -- is still to oversimplify. The Performance Handicap Racing Formula for bluewater keelboats is more like it. (Nuts to that stuff. In the Laser fleet, second place is first last; there's no time compensation for the prize committee to work out at the yacht club over Cuttys later.)

Thus endeth the technical rant.

There are a few goodies in the book.

A few days ago I alluded to the downward mobility of Ashley. That's from a chapter on parental accomplishments and children's names. California, apparently, tracks residency and parental education on birth certificates in such a way as to allow a researcher to stratify names by parental accomplishment. From 1990, Ashley is the fifth-most popular "middle-income white girl" name and the most common "low-income white girl" name. The new high end girl names are Alexandra, Lauren, Katherine, Madison, and Rachel. "Steve" is 10th best at signifying low-education parents, with mothers of "Steve" averaging 11.84 years of education; but in the endnotes, "Stephen" signifies some college, with mothers averaging 14.01 years of schooling.

Earlier today, I speculated that some participants in tenure tournaments might be opting out. That's informed speculation, based (loosely) on the investigation of sumo wrestling tournaments. In such tournaments, the grandmaster norm is eight wins out of fourteen matches. There are prizes for overall winners. Here is a hypothesis, from p. 41.
A final-day match between two 7-7 wrestlers isn't likely to be fixed, since both fighters badly need the victory. A wrestler with ten or more victories probably wouldn't throw a match either, since he has his own strong incentive to win.
But wrestlers with 7 wins (haven't made their norm) win about 75% of their matches with wrestlers with 8 or 9 wins (made norm, unlikely to win tournament) and tend to lose more than half the time in later matches with those same opponents when they are not competing for norms at the end of the tournament.

Might there be a corollary proposition for tenure-track faculty at institutions known for high rates of tenure denials?