That's despite the odds of a particular combination of numbers rolling out of the machinery being orders of magnitude worse than one in a hundred.

Whenever there's a major flood (doesn't matter if it follows a hurricane or a rain storm or it's just the spring thaw somewhere in the north) you'll get people talking about "hundred-year" or "thousand-year" or what have you flood events.

And yes, there are people who lose that lottery multiple times, and perhaps there will be a moment when our political masters will sit down and rethink the current system of publicly subsidized flood insurance, under which the taxpayers replace the same damaged property, repeatedly.  Not surprisingly, the flood insurance program replaces more than a little beach-front property for rent-seekers, and once the emergency passes, I intend to turn to that dimension of the situation.

But the algorithm by which homeowners on a given tract of land will be allowed by our political masters to buy flood insurance -- yes, dear reader, a prudent citizen who would like to carry flood coverage cannot visit the friendly local insurance agent and buy it -- relies on statistical inference from small samples.
If a parcel of land fell inside the boundaries of where a 1-percent-annual-risk flood was likely to reach, any new buildings constructed there would have to be elevated and insured — and would therefore be more expensive. And you might not be able to build there at all. Outside the floodplain, there would be no restrictions.

But there’s a gap between the data those maps are built on and the floodplain boundaries themselves. To get from point A to point B, scientists have to make a lot of assumptions and extrapolations, building in layers of uncertainty that mean the final determination of what is and isn’t in the floodplain should never be thought of as exact.

It begins with roughly 8,000 streamgages, sensors that the U.S. Geological Survey has deployed to collect real-time data on the depth and velocity of rivers and streams across the country. Throw that into a mathematical potpourri with other data points — what’s known about the shape of the stream, say — and you can come up with an estimate of flow, water measured in cubic feet per second.

The USGS collects these flow estimates, plotting them over the years to find the normal amount of water that moves down a given stream — and what that flow looks like when it jumps to levels well above average. Finally, a computer model helps turn those high flow rates (a measure that doesn’t tell you much about whether your couch will be underwater) into an estimated flood depth (which does). Plunk the flood depth estimates down on top of maps and you get a floodplain.
There are parcels on which householders are eligible to buy the insurance for houses with basements, suggesting additional subtleties, but the idea of determining eligibility using statistical inference remains the same.

But the use of the term "normal" covers a multitude of sins.  This Pajamas Media essay explains why supposedly rare events can repeat; the error is common enough that it has a name, the gambler's fallacy, and it's likely that sometime this weekend a .250 hitter has come to the plate oh-for-three in a late inning and a radio announcer has said "he's due."

The essay also suggests that a sufficiently long history of stream depths will generate a histogram that approximates to a Gaussian distribution, or normal curve.  Thus, a stream with an average depth of six inches will almost always be from three to nine inches deep, and almost never exceed a foot, let alone a yard.

Suppose, though, that the rain events that overfill the stream follow a Pareto distribution, or power rule.  Then you might observe a cluster of depths ranging from zero to three inches, and every so often, you reckon your soundings in fathoms.  It might be wise to check whether the data generating the histogram looks more like a Gaussian pattern or a Pareto pattern.

1 comment:

Dave Tufte said...

I was unable to track it down, but I do remember reading something about how the distribution of wave heights in the open ocean has really long tails ... thus, rogue waves.