Andrew Gelman has a blog post, “Probabilistic forecasts cause general misunderstanding. What to do about this?” on 9 August 2020. That got me to thinking. I didn’t want to hijack his comments section with something as long as this, so I’m posting it separately. He writes now in August (his March post is here)
Here’s what I wrote in March:
Consider a hypothetical forecast of 52% +/- 2%, which is the way they were reporting the polls back when I was young. This would’ve been reported as 52% with a margin of error of 4 percentage points (the margin of error is 2 standard errors), thus a “statistical dead heat” or something like that. But convert this to a normal distribution, you’ll get an 84% probability of a (popular vote) win.
You see the issue? It’s simple mathematics. A forecast that’s 1 standard error away from a tie, thus not “statistically distinguishable” under usual rules, corresponds to a very high 84% probability. I think the problem is not merely one of perception; it’s more fundamental than that. Even someone with a perfect understanding of probability has to wrestle with this uncertainty.
As is often the case, communication problems are real problems; they’re not just cosmetic.
I’m confused about how to do this even before we get to non-sampling error. How do we get to the 84%, which is the biggest problem and the hardest to put a number on since it involves the “unknown unknowns”?
Example (i). 100,000 voters. I sample 10,000. 5,200 are for Clinton. My estimated margin is 52% for Clinton. What is my standard error? (straightforward, with effort) With what probability do I think Clinton will win? (More than 52%)
Example (ii) 9 voters. I sample 3. 2 are for Clinton, 1 for Trump. My estimated margin is 67% for Clinton. What is my standard error? (straightforward, with effort) With what probability do I think Clinton will win? (Less than 67%)
Example (ii) is simpler. One way to think about it is that each voter has today and will have on election day the same unnchanged probability of voting for Clinton, X, and I’m trying to estimate that number. My best (modal) guess for Clinton’s margin is then 9X, rounded to the nearest ninth since 60%, for example, is an impossible margin. I can also figure out the probability that if X= 55%, say, that rolling that 55% die 9 times for the 9 voters will end up with Clinton getting 5,6,7, 8, or 9 votes (since the three voters I sampled could all, in this model, change their mind on election day and vote differently from today).
But that assumes each voter has an independent probability X of voting for Clinton, which is an extreme assumption and unrealistic. It’s closer to reality to think that we’ve got Clinton voters and we’ve got Trump voters and we’re trying to figure out how many of each there are: a voter doesn’t flip a coin to decide which he is.
What is the opposite extreme? Part of it would be a model in which we rule out the margin for Clinton being 0-9, 1-8, or 9-0, since we already know how 3 people are going to vote and we assume they won’t change. Here’s a try: for our opposite extreme, assume that each of our sampled people has two other non-sampled people just like him. Then we can deduce that there are at least 6 people for Clinton and 3 for Trump. Since there are only 9 people total, that means that we can predict with 100% probability that Clinton will win, and that it will be by a 6-3 margin.
Of course, this assumes that nothing exogenous happens between now and election day— we won’t have anybody change their minds because of a scandal that appears, for example. And it assumes that nothing endogenous happens— no campaign ads that even out the Democratic free media coverage advantage, no shift towards 50-50 results because of an underdog effect, no shift towards 100-0 results because of conformity. This discussion is not about that problem, any more than it is about the problem of non-sampling non-model error (which I guess, since I’ve conditioned out modelling mistakes, means the error we get from picking a non-random sample).
Is it really useful to try to arrive at a number like 84%? I like the idea of knowing how an expert would bet on whether Clinton gets at least 50%. I think I’d prefer his honest answer without his depending too much on formal analysis,though. Like how I liked Brian Leiter’s ranking of philosophy departments before he went formalistic. I think Professor Leiter (a leftwing Nietzsche scholar) is biased, but I know his biases and can correct for them, and I value his gut opinion more than what he does now, which is some sort of expert survey where he gets to pick the experts so it looks fairer to naive people.
This is related to bayesian-v-classical conundrum: how fancy do you make your model, and do you put in your subjective priors?