Some things to note about this draft:
Though spoilage had previously been ignored by all but the election officials who had to deal with the practical aspects of vote processing, it was not a new problem, nor one special to Florida. Spoilage rates in previous elections were also large enough to have potentially changed the result, and spoilage rates in states other than Florida were large enough to have changed the results in 2000. Thus, the topic is of more than historical interest.
In this paper, I will analyze spoilage in the 2000 Presidential Election in Florida. I focus on this year, place, and office for three reasons. First, the data and background institutional detail is best, and Florida, with its 67 diverse counties, is large enough to give us insight into spoilage generally. Second, although the topic is not just of historical interest, it is of historical interest nonetheless, and many people are interested in the details of what happened there. Third, though the election dispute has ended, Florida is giving careful thought to whether its election process ought to be changed. This study may be useful for policy purposes in Florida, and for the other states that will be interested in whether any changes Florida makes will really make a difference.
This paper looks at what might explain spoilage. I start by discussing spoilage generally, with particular attention to the pattern of spoilage in counties favoring Bush compared to those favoring Gore. I then use regression analysis on county data for all of Florida and precinct data for Palm Beach County. Separately, I look at the question of what kind of counties chose to use voting machines associated with high spoilage rates. Finally, I discuss a reason why the percentage of blacks in a county or precinct might be associated with high spoilage rates.
A scatterplot of Percentage of Ballots Spoiled vs. Percentage of Vote for Gore by county shows little correlation. Gore counties did not systematically have more spoiled ballots than Bush counties.
Much of this paper is about Florida counties, so I will insert a map here to aid the reader. At some later point I will make it more useful by using shading to convey some information.
In Palm Beach County, there were 29,702 spoiled ballots, 6.43 percent of the total. That compares to 2.93 percent for Florida overall. Palm Beach County had the most spoiled ballots of any county, but not by a lot. Runners up were Duval County (Jacksonville) with 26,909 (9.23%, in a county Bush won handily) and Miami-Dade County with 28,601 (4.37%, in a county Gore won by a small margin).
Fifteen other counties had a greater percentage of spoiled ballots than Palm Beach did. Of these counties, 13 went for Bush and 2 for Gore. The biggest percentage was 12.40 percent spoiled in Gadsen County. Eight other counties had between 5.00 and 6.43 percent spoiled, of which 2 went for Gore and 6 for Bush. The lowest percentage spoiled was Leon County (Tallahassee), with 0.18 percent spoiled.
There were a total of 179,855 spoiled ballots. If we look at the percentage of the vote for Gore in each county, and allocate that percentage of the spoiled ballots to Gore and Bush's percentage to Bush, that would yield 2,703 extra votes for Gore. That is greater than the 930- vote Bush margin as of 4:39 p.m on November 17, but it is remarkably small compared to the total number of spoiled ballots-- only 1.5 percent.
Allocating spoiled ballots in proportion to candidates' valid ballots by county is one conceivable voting rule, but it is not used in Florida or anywhere else. In the 2000 election, Democrats did suggest that something like it be done in Palm Beach-- that a judge use statistical analysis to estimate how the voters would have voted had they not spoiled their ballots-- but nobody took this seriously. The conventional remedy for spoilage is rather to recount the ballots to make sure which ballots really are spoiled. The recount reveals some ballots to be valid which were thought to be spoiled and other ballots to be spoiled which were thought to be valid. Usually, more ballots are found to be valid, since voting machines rarely read improperly marked ballots but sometimes misread properly marked ballots.
The first step, which Florida law mandated, was to do a machine recount. This means the election officials check that the machines are in order and feed the ballots back into the machines for recounting. This is useful for several reasons. First, machinery inevitably has a certain amount of slack which can cause the same ballots to be read differently by the same machine. It is, of course, unclear that repeated recounts will ever converge on a uniform result if this is the only source of misreading. Second, punchcard machines, and particularly the Votomatic type which uses pre-perforated ballots that the voter punches with a stylus, can leave "hanging chads"--- holes partly punched. These chads can be brushed away in a recount. Most people think that is a valid occurrence, but the other side of the coin is that ballots degrade with repeated counting, even if no fraud is involved, and ballots that were originally valid may become unreadable because of wear. Third, human error may have distorted the first count. Some ballots may not have been put through the machines, or some might have been put through twice, for example. Such error is more likely in the original count than in a recount because less attention would have been paid when it was not known that the election would be so close. The data used in this paper is all from after the machine recount.
The second step, the source of most of the post-election litigation, is to do manual recounts. This involves people looking at each ballot to see if the machines read it properly. This paper is not about the intricacies of how this might be done, which no doubt will be discussed at length in law review articles. It is worth noting here, however, that one would not expect manual recounts to produce a large change in election results, in Florida 2000 or elsewhere. Palm Beach County had xxx spoiled votes, and a manual recount by partisan Democrats produced only xxx extra votes for Gore, a total which does not even subtract any challenges to the Democratic count the Bush campaign might have successfully made in court had the process not been halted.
If this Palm Beach figure of xxx percnet is applied to the 2,703 extra votes for Gore, the number drops to xxx. So Gore would still not have won.
The relevance of that to the current article is that the number of spoiled ballots is truly large. The problem is not that voting machines are unreliable, but that voters either choose to spoil their ballots or make large number of mistakes.
The data I use is from a number of sources. The percentages of spoiled ballots and total ballots cast are from the Orlando Sentinel, http://orlandosentinel.com/elections/lost.htm. The Gore 2000 and total 2000 votes are from Adams and Fastnow, "A Note on the Voting Irregularities in Palm Beach, Florida." You can also get the Florida state data by county directly from the Florida Division of Elections, which is where I found the data on turnout in the 1998 senatorial election. Most of the variables are from a University of Virginia website that has the U.S. 1994 county data arranged nicely,< A HREF= "http://fisher.lib.virginia.edu/ccdb/county94.html "> Fisher.lib.virginia.edu/ccdb/county94.html.
I have the data available in several formats:
My data sources are in conflict on two counties. The Sentinel says that Baker County used punchcards and Martin County a lever machine, where the Florida Division of Elections webpage says they used the ESSIVC optical precinct and Datavote punchcards. An email from Baker County tells me they used optical ballots, and other sources say that Martin used a lever machine. (xxx add other sources)
Tables 3 and 4 show the summary statistics and correlation matrices. All statistics in this paper are rounded downwards.
TABLE 3: SUMMARY STATISTICS FOR THE COUNTY DATA (n=67, 54 for Undervote and Overvote) Variable | Mean Median Min Max ---------+--------------------------------------------------------- Spoilage | 3.88 3.34 0.17 12.40 Undervote | 0.92 0.72 0.00 3.49 Overvote | 2.69 1.14 0.00 11.60 Mean_Income | 12.453 12.112 8.527 21.385 Population | 201.234 80.123 5.745 2,007.972 Population_Growth | 45.04 37 -1 208 Foreign | 4.96 3.6 0.6 45.1 Non-English-Speakers | 8.20 6.4 1.9 57.4 Poverty | 11.73 11 5 25 Gore's_Percentage | 40.74 24 40 66 Black Percentage | 13.35 11.94 1.92 56.11 Aged_Over_65 | 17.48 15.7 7.5 33.8 Aged_Over_75 | 6.81 6.4 2.7 14.2 High_Turnout | 1.54 1.52 1.32 1.87 TABLE 4: THE CORRELATION MATRIX FOR THE COUNTY DATA (n=67,variable names abbreviated) | Spoilage Op-Cent Op-Accu Pun-Dat Pun-Vot Pop +--------------------------------------------------------------- Spoilage| 1.00 Op-Cent | .54 1.00 Op-Accu | -.56 -.31 1.00 Pun-Dat | .38 -.22 1.00 Pun-Vot | -.05 -.30 -.30 -.21 1.00 Pop | -.16 -.23 -.07 -.21 .55 1.00 Black | .58 .27 -.15 .14 -.15 -.01 Popgrow | -.41 -.20 .27 -.14 .10 -.05 Foreign | -.10 -.15 .00 -.16 .41 .79 Noneng | -.02 -.14 -.04 -.07 .36 .70 MeanInc | -.46 -.36 .13 -.27 .44 .42 Poverty | .61 .41 -.21 .24 -.35 -.31 Gore | -.12 -.10 .27 -.12 .26 .46 Over65 | -.23 -.08 .05 -.14 .38 .06 Over75 | -.17 -.05 -.03 -.15 .44 .18 Turnout | .00 -.12 -.11 -.08 .21 .28 | Black Popgro Foreign Noneng MeanInc Poverty Gore +-------------------------------------------------------------------- Black | 1.00 Popgro | -.48 1.00 Foreign | -.11 .12 1.00 Noneng | -.10 .08 .95 1.00 MeanInc | -.39 .37 .38 .30 1.00 Poverty | .59 -.51 -.24 -.18 -.80 1.00 Gore | .23 .15 .35 .29 .27 -.18 1.00 Over65 | -.40 .39 .12 .03 .36 -.41 .30 Over75 | -.31 .19 .16 .06 .37 -.37 .34 Turnout | -.15 -.01 .31 .32 .04 -.02 -.14 | Over65 Over75 Turnout --------+--------------------------- Over65 | 1.00 Over75 | .95 1.00 Turnout | -.32 -.31 1.00The correlation matrix reveals that there is enough correlation between these variables that regression analysis is necessary to untangle their effects. For example, Non-English-Speakers and Population have a correlation of 0.70-- big counties have a bigger percentage of people who do not speak English at home-- so care must be taken to separate out the two effects. Non-English-Speakers has a very small correlation of -0.03 with the percentage of spoiled ballots, but could that be because the non-English speakers are in big counties, whose efficiency in conducting elections cancels out the difficulties that non- English speakers have? Regression analysis is a procedure suited to solving exactly this kind of problem.
Exit polls ask individuals leaving polling places how they voted, thus avoiding the ecological fallacy, though incurring the risk of false answers. At any rate, they provide useful information helpful in interpreting the averages in counties or precincts. Here are results from an exit poll, available from CNN or for the 2000 Presidential election in Florida. Note that since the minor candidates picked up from 0 to 5 percent of the vote, if Gore's percentage is over 48 he may be even with or ahead of Bush.
TABLE 5: EXIT POLLS Type of Person This Type/All_Voters Gore_Votes_of_this_Type/This_Type Men 46% 42% Women 54% 53% White 73% 40% African-American 15% 93% Hispanic 11% 48% Asian 1% -- Other 1% -- 18-64 80% 49% 65 and Older 20% 46% Income under $15,000 8% 62% $15,000-$30,000 17% 60% $30,000-$50,000 26% 48% $50,000-$75,000 23% 45% $75,000-$100,000 12% 40% Over $100,000 14% 33% Married 65% 43% Not Married 35% 58% Democrat 40% 86% Republican 38% 8% Independent 22% 47%What is noteworthy for the purposes of this paper is that Gore did better with blacks (93% to 7%) and poor people (62% to 32%), the two candidates were even with Hispanics (48% to 49%), and Gore did worse with old people (46% to 52%).
R-Squared = .78. N=67. The coefficient is followed by the t- statistic. Variables significant at the 10 percent level are starred.
SPOILAGE: Coefficient t-statistic Constant | -3.49 0.74 Lever | -.16 0.09 Optical-Central | 4.02* 5.63* Optical-Precinct-Accuvote | -.31 0.42 Paper | 2.74 1.61 Punchcard-Datavote | 3.95* 4.67* Punchcard-Votomatic | 2.23* 2.76* Log(Pop) | -.25 0.95 Pop Growth | .00 0.19 Foreign | -.10 0.84 Noneng | .10 1.16 MeanInc | .10 0.94 Poverty | .08 0.93 Gore | -.04 1.32 Black | .12* 4.05* Over65 | .07 0.56 Over75 | -.09 0.30 Turnout | 2.21 1.05What we see from this regression is that most of the variables are statistically insignificant. Only the type of voting machine and the percentage of black population in a county are related to spoilage. The regression's main lesson, however, is that spoilage is unrelated to:
A look at the Table 1 shows why the regressions turn out this way. The six counties with the greatest percentage of elderly are Charlotte (34%), Citrus (32%), Hernando (31%), Highlands (34%), Pasco (32%), and Sarasota (32%). Their spoiled ballot percentages were 4.79%, 0.38%, 0.43%, 2.89%, 2.74%, and 1.74%, mostly below average and none of them in the top quartile. Counties with lots of old people just don't have more spoilage.
. Broward (21%), Dade (14%), Palm Beach (25%), and Volusia ( 23%) are not notably high in their percentages of old people. The ``Old Lady in Palm Beach'' jokes that sprang up in November 2000 were based on misimpressions (though see the precinct-level results below).
The regression above combines overvotes and undervotes in its definition of spoilage. CNN and the Associated Press collected data breaking spoilage down for 54 counties. (The total spoilage is not the same as in the Orlando Sentinel data used above, but seems to have no systematic difference, and the discrepancy is perhaps due to the dates on which the data was collected, since vote totals shifted during the month after the election.) David Rusin cleaned this up somewhat, fixing some switched entries, and I use his data here. It turns out that undervotes and overvotes are quite different. As Table 6 shows, there is virtually no correlation between the percentage of each by county.
TABLE 6: CORRELATIONS BETWEEN SPOILAGE TYPES | Spoilage Under Over --------+----------------------------- Spoilage| 1.00 Under | 0.18 1.00 Over | 0.97 -0.06 1.00Repeating the earlier regression for undervotes and overvotes separately on the 54 counties (dropping the Lever and Paper variables, since no county in the reduced dataset uses them) yields the following.
N=54. The coefficient is followed by the t-statistic and the simple correlation. Variables significant at the 10 percent level are starred.
OVERVOTES: Coefficient t-statistic Correlation (R-Squared=.83) Constant | -2.08 0.37 ---- Optical-Central | 5.08* 6.27* .76 Optical-Precinct-Accuvote | .75 0.95 -.47 Punchcard-Datavote | 5.09* 3.69* .15 Punchcard-Votomatic | 2.11* 2.35* -.17 Log(Pop) | -.10 0.34 -.45 PopGrowth | .01 1.08 -.37 Foreign | -.03 -0.26 -.10 Noneng | .04 0.38 -.04 MeanInc | .12 1.01 -.44 Poverty | .09 0.80 .62 Gore | -.03 1.04 -.13 Black | .12* 3.11* .61 Over65 | -.11 0.78 -.30 Over75 | .22 0.70 -.23 Turnout | -.01 0.00 -.04
UNDERVOTES: Coefficient t-statistic Correlation (R-Squared=.67) Constant | -.31 0.16 --- Optical-Central | -1.04* 3.63* -.28 Optical-Precinct-Accuvote | -.84* 2.98* -.42 Punchcard-Datavote | -.52 1.06 .04 Punchcard-Votomatic | .49 1.54 .19 Log(Pop) | -.22* 1.99* .05 PopGrowth | -.006* 1.73* -.15 Foreign | -.01 0.36 .14 Noneng | .01 0.46 .15 MeanInc | .01 0.33 .10 Poverty | .01 0.29 .03 Gore | .00 0.49 .00 Black | .00 0.42 .00 Over65 | .07 1.38 .13 Over75 | -.11 0.96 .17 Turnout | 1.10 1.20 .30xxxx discussion here.
R-Squared = .81. N=54. Variables significant at the 10 percent level are starred.
OVERVOTES: Coefficient t-statistic Constant | -1.07* 1.74* | Optical-Central | -4.99* 7.36* Optical-Precinct-Accuvote | .28 0.42 Punchcard-Datavote | 5.28* 4.70* Punchcard-Votomatic | 1.65* 2.48* | Black | .12* 5.50*This regression says that a county which had 15 percent black inhabitants and used the ESS IV-C optical ballots counted at the precinct level (the missing machine dummy) would have .73 percent (=-1.07 + .12 (15)) overvotes. A county which used Votomatic punchcards instead would have 1.65 percent more overvotes. A county with 20 percent more black inhabitants would have 2.40 percent more overvotes. For any of these, to get the spoilage total, undervotes would have to be added, an average rate of 0.92 percent across counties.
How about undervotes? Retaining only the significant variables, we get the following. R-squared =0.63. N=54.
UNDERVOTES: Coefficient t-statistic Constant | 2.35* 6.80* | Optical-Central | -.90* 3.67* Optical-Precinct-Accuvote | -.65* 2.73* Punchcard-Datavote | -.43 1.06 Punchcard-Votomatic | .83* 3.35* | Log(Population) | -.24* 3.99* Population Growth | -.0035* 1.87*xxx here put exaplnation of this. Outliers
It is interesting to look at which counties are outliers, most poorly explained by the regression equation. I'll use the original spoilage regression for this. A bigger residual--either positive or negative-- indicates a bigger unexplained percentage of spoiled ballots. Tables 1 and 2 both list the residuals for the individual counties. Let us look at the eight counties with residual greater than 1.70 percent. The counties with unexpectedly high spoilage are Duval (4.20%), Palm Beach (3.46%), Okeechobee (2.35%), and Hendry (1.95%), and DeSoto (1.83%). The counties with unexpectedly low spoilage are Polk (-4.27%) , Madison (-2.32%), and Sumter (1.72%).
Duval County, for example, has a residual of 4.20%, indicating that its spoiled ballot percentage of 9.22% is 4.20% more than the regression equation would predict based on its use of punchcards and its 23 percent black population.
Note that regression outliers are not necessarily the same as outliers in terms of the dependent variable, percentage of spoiled ballots. Gadsden County has the most spoiled ballots, 12.40 percent, but its residual of 1.56 percent is only moderately high; the regression equation does explains its high rate of ballot spoilage. Okeechobee County has typical values for all of the variables except its spoilage rate of 8.00 percent, but it is a regression outlier, with a residual of 2.35 percent, so the variables in the regression equation do not explain its high spoilage rate very well.
Only one of the four counties that did manual recounts is an outlier under the 2.00 percent criterion. Broward, Miami-Dade, and Volusia Counties have residuals of -0.14, -1.18, and 0.43 percent, all reasonably close to the regression's prediction. Palm Beach County has a residual of 3.46 percent, so its spoilage rate is surprisingly high given its low black population (12 percent), even though its spoilage rate of 6.42 percent was not extreme.
This analysis has been an exercise in data description rather than a test of a formal model. One must be careful in interpretation. It is easy to fall into the "ecological fallacy" (See the King and Robinson references at the end of this paper). Suppose I was trying to explain Ku Klux Klan membership by county, and I ran a regression. I might well find that Klan membership was strongly associated with percentage of the population that was black in the county. The ecological fallacy would be to jump to the conclusion that blacks have higher-than-average tendencies to join the Klan. In the present context, I have found that counties with more black and poor people tend to have more spoiled ballots. This might be because blacks and poor people spoil more ballots when they vote, but not necessarily. It might also be, for example, that counties with more blacks and poor people choose to have more confusing ballots for some devious political reason and that everybody, rich, poor, black and white spoils more ballots as a result. But these regression results do tell us what the patterns are across counties, if not why those patterns exist.
The pitfall arises in the difference between optical ballots counted centrally and those counted in the precincts. It could be that a county would reduce its spoiled ballot percentage by 4.90 if it stopped counting centrally and counted at the precincts instead, if the precinct system is one where the vote is registered at the booth itself, and notifies the voter of a misvote. But something else may be going on in the kinds of counties that use each system.
The Observed Choice Problem: Choice of Voting Machines by Counties
It would be interesting to know what variables explain which counties use which voting system. Do counties with more old people generally use optical-precinct machines, for example, so that the number of elderly does have an indirect effect on the number of spoiled ballots? The correlation matrix above gives some insight into this; there are no strong correlations except perhaps a tendency for centralized optical ballots to be used in counties with more poor people.
Choice of voting system is important to interpreting even the results here, however. As I note in my 1998 Public Choice paper on the ``Observed Choice'' problem, particular policies are likely to be chosen in the districts where they work the best, and this biases regression trying to estimate the average effect of policies. In the present context, optical ballots scanned at the precinct level seem to reduce spoilage, and one might suggest that other counties wanting to reduce spoilage adopt them. It might be, however, that the counties know the effects of voting machines better than I as analyst know them, and they know that in some counties the special machines reduce spoilage but in others they do not, so the counties where they would work have adopted them, explaining why we see heterogeneity in machines across counties.
Another peril of observed choice is that districts may adopt policies which solve problems, which therefore do not show up in regressions. Suppose,for example, that old people have special trouble with punchcards. Counties with many old people would then tend not to use punchcards, so in my regressions, an otherwise existing effect of old people having more spoilage would not show up. If all counties were forced to use punchcards, then it would be clear that old people have trouble with them. Potentially, that could explain why my precinct results below for Palm Beach County are different. As we will see in a later section, however, this turns out not to be the case.
Heteroskedasticity does not seem to be a problem, since eyeballing a scatterplot of Residuals vs. Population by county shows no clear correlation. I had feared that small counties might have disturbances with greater variance, since a given change in the number of spoiled ballots would have a bigger impact than in the big counties.
Dependent Variables Bounded by 0 and 100 Percent
I was concerned that since the percentage of spoiled ballots is bounded below by 0 and above by 100, tobit estimation might be appropriate, except that it is reliable only for large sample sizes (and N=67 is not very large). I tried running tobit, however, and it indicated that few of the predicted values were censored at 0 or 100. Thus, it seems OLS works well.
I have just reported what is almost the simplest regression I ran: linear, except for using the logarithm of county population. After I ran the purely linear specification for the first posted draft, I started trying other specifications in response to comments I received. Instead of simply Percentage of Spoiled Ballots (call it "P"), one could use log(P) or the log odds ratio, log(P/100-P)) on the left-hand side. (The log odds ratio has the nice property that it can range from negative to positive infinity as P goes from 0 to 100, so the censoring problem in the previous paragraph disappears.) Instead of, for example, Pop92 on the right-hand-side, one could use log(Pop92), which assumes a decreasing effect of increases in county size rather than a constant effect of such increases. Instead of including all the counties, one could (a) drop or (b) add dummy variables for the counties with unusual voting systems and for counties that are outliers in the right- hand-side variables. Dade County worries me, for example, because it is so extreme in population (2,008, compared to the next-highest county's 1,301 and the mean of 201), percentage foreign- born (compared to the next-highest county's 16 and the mean of 5), and percentage not speaking English at home (compared to the next-highest county's 24 and the mean of 8).
After trying some of this, I retreated to a simple specification. This paper's aim is data analysis rather than testing a formal theory. I do not have a particular model of voter behavior I am trying to test; rather, I am looking for patterns in the data that would suggest what might be happening. This has a number of implications:
First, I have no model to tell me which specification is best-- whether the variables are linked linearly or logarithmically.
Second, I want to keep my regressions simple enough that I can get a feel for what is going on. In my situation, I cannot set up a complicated regression to fit a formal model, run it through the computer, and accept the results of a test that tells me to accept or reject my model. Often that is appropriate, but not here. Rather, I am essentially looking for the correlation between Spoiled Ballots and variable X conditional on variables W, Y, and Z being held constant. This is just one step up from looking at the correlation matrix, and one step is enough. If I start doing log transformations, I can't understand the regression any more. In particular, it becomes difficult to interpret coefficient size when the left-hand-side is, for example, an odds ratio.
Third, since I am not testing a formal model, I am not concerned about the validity of hypothesis tests. Statistical correctness requires that assumptions such as an unlimited range for the left-hand-side variables be satisfied, that if the significance of several variables is tested that an F-test be used rather than multiple t-tests, and that there be sufficient data to justify large-sample techniques such as tobit. If I were doing all this to test the single hypothesis that Gore's percentage of the vote explained the amount of Spoiled Ballots conditional upon several other variables being held constant, then I might want to be more formal. But I am interested in looking for patterns more generally. That means when I say a t-statistic indicates lack of significance I am not really using formal statistical theory; I am saying the coefficient size is small relative to the standard error and so there is no clear indication that the true coefficient is different from zero. For this modest purpose, a linear specification and multiple t-tests are appropriate. What I need to be most careful about, given my lack of a theory, is getting the list of explanatory variables correct. As an illustration, in my first run I had notyet typed in the voting machine variables, which turned out later to be the most important ones. My initial results were misleading about the importance of other variables that happened to be correlated with machine type and picked up its effect.
I did, however, decide to use a regression with Log(Population) rather than Population as an explanatory variable. Dade County has 335 times the population of Liberty County, and I just couldn't believe that if being bigger counties had less spoilage the spoilage effect would be 335 times bigger in Liberty County. Using Log(Population) reduced the ratio between Dade and Liberty to a more reasonable 4.24. And in the regression, county size now became statistically significant. All the other variables except Mean Income are in percentages, which are more likely to have a linear effect, and mean income only varies from 9 to 20.
Unfortunately, this data, and especially the demographic registration data and data separated into overvote and undervote, as opposed to simple voting results, is available only for a few counties, and those not a representative sample. Nonetheless, it will be useful in this section to look at two important counties for which data is available by precinct on undervote, overvote, and demographics, Miami-Dade and Palm Beach. Data is available exactly because these counties are so special: they are large, they are urban, their spoilage was heavily disputed, and they are Democratic. But let us see what the data says.
A map of ballot spoilage rates by precinct in Palm Beach, Dade, and Brouward Counties is up on the web at Sun- sentinel.com/shopping/map/sfprecincts.htm.
"Gore"is the percentage of the vote for Gore. "Vote" is the total number of votes in the precinct. "Dem", "Rep, " and "Other " are the percentages of registered voters of each party. Note the high percentage of "Other." Note also that in Palm Beach County, looking at the simple correlations Gore did not do better in more hispanic or non- Republican- or- Democratic precincts, though he did well in precincts with more old people, blacks, women, and Democrats. Care must be taken with the simple correlations, though-- I would not be surprised if Gore's apparent strength with women was really strength in precincts with old people, who are disproportionately female.
I have excluded absentee precincts, since I do not have demographic data for them, and the xxx precincts in Miami-Dade and 38 in Palm Beach with less than 25 ballots cast, since those are peculiar precincts and the percentage differences there are more variable. I also dropped any precinct with a turnout measured to be over 150%, as likely to be in error (xx in Miami-Dade and 1 in Palm Beach).
TABLE 7: SUMMARY STATISTICS FOR THE PRECINCT DATA MIAMI-DADE (N=564) PALM BEACH (N=492) Variable | Mean Std.Dev. Min-Max | Mean Std.Dev. Min-Max ---------+----------------------------------------------------- Gore | 55.70 24.41 13.51-100 | 58.20 13.71 7.46-90.37 | | Overvote | 2.93 2.26 0-10.97 | 4.98 3.51 0-34.48 Undervot | 1.56 .95 0-7.27 | 1.82 1.82 0-13.86 | | Turnout | 65.87 20.67 1.50-145.43| 62.00 9.70 14.14-84 Ballots | 993.27 539.23 27-3007 | 843.60 507.78 29-2434 | | TotalReg | 1553 767 42-4451 |1337 752 41-2939 Democrat | 45.52 21.15 13.52-92.15| 45.54 16.53 4.12-89.16934 Repub. | 36.53 19.09 1.32-69.52 | 34.97 14.03 3.8 -84.02 Black | 19.99 30.18 0-97.20 | 10.01 20.98 0-95.51 Hispanic | 41.48 27.06 .08-87.60 | 3.47 4.00 0-27.22 Female | 55.24 4.01 39.90-69.06| 54.31 4.22 39.09-73.76 Age65up | 23.57 12.42 4.46-76.84 | 33.94 25.47 1.48-96.73
TABLE 8: A CORRELATION MATRIX FOR THE PRECINCT DATA: MIAMI_DADE (N=564) (PALM_BEACH underneath, N=492) | Gore Overvote Undervot Turnout Votes Democrat Black Hispanic -------- +----------------------------------------------------------------- Gore | 1.00 | (1.00) | Overvote| .56 1.00 | (.42) (1.00) | Undervot| .32 .53 1.00 | (.04) (.11) (1.00) | Turnout | -.14 -.12 -.12 1.00 | (.07) (-.31) (.00) (1.00) | Ballots | -.22 -.11 -.12 .41 1.00 | (.09) (-.21) (.33) (.38) (1.00) | Democrat| .88 .60 .37 -.11 -.24 1.00 | (.83) (.72) (.13) (-.07) (-.05) (1.00) | Black | .71 .71 .43 -.11 -.11 .88 1.00 | (.38) (.65) (.03) (-.38) (-.19) (.60) (1.00) | Hispanic| -.80 -.29 -.18 .08 .31 -.86 -.64 1.00 | (-.09) (.14) (-.04) (-.34) (-.19) (-.01) (.06) (1.00) | Over65 | -.35 -.10 .04 .06 .02 -.34 -.38 .42 | (.29) (.14) (.21) (.30) (.11) (.29) (-.30) (- .43)Here are the regression results:
PRECINCT REGRESSIONS FOR MIAMI-DADE (N=564) AND PALM BEACH (N=492) (t-statistics in parentheses) OVERVOTE UNDERVOTE MIAMI-DADE PALM BEACH MIAMI-DADE PALM BEACH (R-sq=.66) (R-sq=.75) (R-sq=.26) (R-sq=.24) Constant | -5.278 (2.97)* 23.24 (7.65)* .430(0.39) 7.072 (2.55)* Gore | .046 (8.72)* -.202 (11.69)* .007 (2.25)* -.070(4.46)* Turnout | -.001 (0.33) -.008 (0.81) -.001 (0.77) -.020 (2.16)* Ballots | -.00041(3.31)* -.00030(1.74)* -.00016(2.06)* .00151(9.48) * Democrat | .037 (1.58) -.025 (0.97) -.002 (0.15) .020 (0.85) Republican| .020 (1.18) -.285 (8.01)* -.002 (0.22) -.077 (2.38)* Black | .048 (7.34)* .078(10.51)* .016 (4.07)* .012 (1.84)* Hispanic | .055 (9.54)* .075 (2.86)* .008 (2.40)* .013 (0.57) Female | -.002 (0.12) .053 (2.15)* .001 (0.13) .029 (1.32) Over65 | .016 (2.48)* .045 (7.83)* .016 (3.95)* .021 (4.09)*First, consider overvotes. What the two counties have in common is that small precincts (low "Ballots") and those with large proportions of Blacks, Hispanics, and old people have more overvotes, whereas turnout and the percentage of Democrats makes no difference. At the same time, the two counties clearly do not follow the same patterns. Variables are significant in one regression but not the other (Female); significant in both but with very different coefficient sizes (Over65), and significant in both but with opposite signs (Gore). Undervotes show a similar mix of results. What the two counties have in common is that small precincts and those with large proportions of Blacks and old people have more overvotes, whereas turnout and the percentages of Democrats or women makes no difference. Recall that at the county level, xxx It coudl be that there is a critical threshold effect for spoilage. I can test for that. The size of the coefficients has meaning. Increasing the percentages of Democrats, Others (not Republican or Democrat), and Hispanics had the strongest effects, with coefficients of .19, .15, and .12. Increasing the percentages of blacks (.08) or old people (.04) had distinct but smaller effects.
How can this be reconciled with the county-level regressions, where the percentages of non-English-speakers and old people had no effect on spoilage? There are a number of possibilities.
Getting out the vote is a longstanding and honorable campaign tactic unless bribery is used, which has not been alleged in Florida. Paying people to telephone voters on Election day and go door to door is fully legitimate. The excerpts below from a New Republic article on the 2000 New Jersey campaign give a feel for how it is done and how important it was to the Gore campaign.
"The following year, applying the turnout techniques of the Torricelli campaign, Democrat Jim McGreevey came from nowhere to within 26,000 votes of unseating popular Governor Christie Todd Whitman, with Whitman's share of the black vote dropping eight points from her 1993 race. A study comparing the tight 1997 race to Whitman's 1993 victory over Democrat Jim Florio-- who had no black turnout program--is treated like a state secret within the party. "It's remarkable," says Corzine campaign manager Stephan DeMicco, who declined to share a copy of the study with me. "It's got too much strategic power for us.... The study of '93 to '97 has resulted in whole new approaches to electoral targeting for us. The lessons that we learned from that study ... are being applied in many other states now."
"We actually call it ... the New Jersey Plan," says Thomas,...
...
The sheet shows each precinct's registered voters, turnout history, Democratic performance, and, most important, vote goal. Precincts where turnout is low but Democratic performance is high are marked in red, since they constitute prime knock-and-drag territory on Election Day.
...
In addition to these mailings, Thomas hit black voters with live phone calls urging them to vote. On the Monday before the election, voters were given a reminder call; on Election Day itself, a massive phone bank operated from 8:00 a.m. to 7:00 p.m. "Those phones are on a continual cycle," Thomas says. "The only way [a voter] comes out of the cycle is if [he] answers the phone." When a district is performing below Thomas's expectations, she can immediately retarget the phones, increasing calls to that area.
...
Where exactly all these workers came from became a campaign issue in the final days of the race, when The New York Times discovered that many were being shipped in from homeless shelters and drug-rehab centers in Pennsylvania.
...
"Understand the mission," he instructs his flushers. "The mission is to get a registered voter out of their home and to the polls. Ladies and gentlemen, we are in very bad shape. I want you to load up on everything that moves." He then takes aside a sound-truck driver and traces a route for him to follow. Minutes later, the teams are blanketing the streets, knocking on doors and dragging out voters." ("NEWARK DISPATCH, Knock and Drag" Ryan Lizza, The New Republic , Post date 11.09.00, Issue date 11.20.00)
The effort is admirable, but consider the implication for spoiled ballots. I will dramatize a bit. An elderly lady, perhaps not even registered as a Democrat, lives in a mostly-black precinct and so is targeted. Her phone rings constantly, and whenever she gets up from watching her soap opera to answer it, she hears some stranger tell her to vote. Every two hours, a nice young college student or a scruffy man who looks like he needs a drink knocks on her door and asks her to step out for a few minutes and vote, please, because if his team can get enough turnout in the precinct they'll get a bonus.
So what does she do? She finally turns off her TV, walks quickly to the school, has her name checked off, goes into the voting booth, punches a few names at random, and hurries home to her soap opera. Or, she doesn't punch any names at all, having put a stop to the phone calls by having her name checked off.
The problem is akin to that of vote buying in a system using the secret ballot. How do you make sure the voter upholds his end of the deal? The vote buyer can't tell if the voter voted for the candidate who bought his vote. The knock-and-drag man can't tell if the voter voted for any candidate at all. What makes the problem even worse for the knock- and- drag man is that though the bought voter has some gratitude towards the candidate who paid him, the harassed voter is full of resentment. Thus, though knock-and-drag efforts will help a candidate on net, they will also result in more spoiled votes. And this will be particularly true when the effort targets a population based not on past voting habits or registration, which at least indicate an interest in politics, but on demographic characterics.
The well-established regularity that black voters tell pollsters they voted when they actually did not even more than non-black voters do is perhaps related (see Deufel and Kedar, 2000). Black voters may be more used to being pestered if they do not vote, and reflexively try to put off pollsters as well as campaigners. This also, of course, suggests that the polling result of overwhelming black support for Democrats may be exaggerated, though the Democratic campaigners would not pursue black turnout so heavily if the result were not mostly true.
The Harassed Voter Hypothesis may be testable. If there were an election in which blacks were expected to vote evenly for the two candidates, then the candidates would have no incentive to increase the black vote generally. Instead, they would target different groups for getting out the vote-- one candidate might go after old people and another after young people, for example. In that case, if the Harassed Voter Hypothesis is true, black districts should have no more spoilage than others. Blacks are heavily Democratic, so this suggests the place to look would be a Democratic primary election without racial undertones.
Another simple test is already failed by the Harassed Voter Hypothesis. It implies high rates of both undervoting and overvoting. Indeed, the quickest way to vote is not to cast votes for any candidate. Yet black percentage in a county is unrelated to undervoting. This lends support to the alternative that with high black turnout, more voters were inexperienced-- though though lack of significance in the Turnout variable in the county- level regressions runs against that hypothesis.
There are 67 observations, and the pseudo-R-squared is .25. ("Pseudo" because logit does not generate the standard R-squared of ordinary least squares.) The table below shows the coefficient (which is not straightforward to interpet in a logit regression), the z-statistic, and the simple correlation between that variable and whether a county used a low- spoilage ballot. Regression coefficients significant at the 10 percent level are starred.
HIGH SPOILAGE MACHINES: Coefficient z-statistic Correlation Constant | .70 0.14 -- Log(population) | -.42 1.13 -.16 Population growth | -.01 1.09 -.220 Foreign | -.08 0.23 .08 Non-English-speaking | .35 1.57 .13 MeanIncome | -.14 0.68 -.16 Poverty | -.02 -0.14 .24 Gore | -.08 1.35 .02 Black | .15* 2.07* .24 Over65 | .18* 2.85* .11First, note the simple correlations. The positive correlations indicate that counties with more immigrants, non-English speakers, Gore voters (barely), blacks, and old people tend to choose high-spoilage machines. The negative correlations indicate that counties which are larger, growing, richer, and with fewer poor people tend not to choose high- spoilage machines. These simple correlations, however, confound the effects of the different variables that are correlated with each other. The regression coefficients adjust for the independent effect of each variable. The two significant variables are the percentages of people who are black and who are over age 65, both of which increase the probability the county chooses a high-spoilage ballot. Such things as county size, population growth, and the wealth of the county do not seem to matter.
The sample size is small for logit, however, and there is considerable collinearity, as can be seen from the correlation matrix in Table 4. An example of the instability of the results is that when I used the state website information on machine type, which differs only for Baker and Martin counties, the percentage of black population was marginally insignificant instead of significant, as it is above. Using the same data as above, I ran the regression again after dropping any variable with a z-statistic less than 1.30. The regression now has a pseudo-R-squared of .19 and the following coefficients: :
HIGH SPOILAGE MACHINES Coefficient z-statistic Constant | -1.69 1.10 Noneng | .18* 2.00* Gore | -.10* 2.21* Black | .19* 2.89* Over65 | .15* 2.76*This second regression contains indications that counties with more non-English speakers and fewer Gore voters, other things equal, pick high-spoilage ballots.
A regression,however, is probably not the right tool to use when asking the question: "Who lost more votes due to high-spoilage machines-- Bush or Gore?" For that question,the best county-level approach would be to go back to my earlier regressions to find the effect of high-spoilage machines conditioning on other county features, and then multiply that by the number of Bush and Gore votes in each county.
David J. Rusin, ``Likelihood of Altering the Outcome of the Florida 2000 Presidential Election by Recounting,'' Northern Illinois University, Jan. 5, 2001. http://www.math.niu.edu/~rusin/uses-math/recount/index.html
*REGRESSION 1 ; . regress spoilper Mmod1 Mmod2 Mmod3 Mmod6 Mmod7 Mmod8 Sublit > LPop92 > popgro > Foreign noneng MeanInc poverty Goreper blacper oldper age75pl > highturn ; Source | SS df MS Number of obs = 65 ---------+------------------------------ F( 18, 46) = 14.80 Model | 518.525195 18 28.8069553 Prob > F = 0.0000 Residual | 89.5137475 46 1.94595103 R-squared = 0.8528 ---------+------------------------------ Adj R-squared = 0.7952 Total | 608.038942 64 9.50060847 Root MSE = 1.395 ----------------------------------------------------------------------- ------- spoilper | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------- +-------------------------------------------------------------------- Mmod1 | -.9701006 1.725915 -0.562 0.577 -4.444187 2.503985 Mmod2 | 3.918342 .7024633 5.578 0.000 2.504356 5.332327 Mmod3 | -.5017297 .7107465 -0.706 0.484 -1.932388 .9289288 Mmod6 | 2.01425 1.669907 1.206 0.234 -1.347099 5.375599 Mmod7 | 4.009866 .8257787 4.856 0.000 2.347659 5.672072 Mmod8 | 2.48855 .8077395 3.081 0.003 .8626553 4.114446 Sublit | .1897027 .0745313 2.545 0.014 .0396791 .3397263 LPop92 | -.1402608 .2872131 -0.488 0.628 -.7183907 .4378691 popgro | -.0015095 .0084955 -0.178 0.860 -.0186101 .0155911 Foreign | .0063984 .1367082 0.047 0.963 -.2687809 .2815776 noneng | .0073044 .1007893 0.072 0.943 -.1955738 .2101826 MeanInc | .1840353 .1144325 1.608 0.115 -.0463054 .414376 poverty | .042256 .0864096 0.489 0.627 -.1316775 .2161896 Goreper | -.0328812 .0322865 -1.018 0.314 -.0978705 .0321081 blacper | .067874 .0384695 1.764 0.084 -.0095611 .1453091 oldper | .0194023 .1257668 0.154 0.878 -.2337532 .2725579 age75pl | -.2153074 .2991274 -0.720 0.475 -.8174195 .3868048 highturn | 1.631412 2.173899 0.750 0.457 -2.74442 6.007243 _cons | -5.749018 4.953201 -1.161 0.252 -15.7193 4.221259 . correlate spoilper Sublit Underper Overper ; (obs=52) | spoilper Sublit Underper Overper --------+------------------------------------ spoilper| 1.0000 Sublit| 0.5660 1.0000 Underper| 0.2017 0.0356 1.0000 Overper| 0.9698 0.5693 -0.0422 1.0000 correlate Overper Mmod1 Mmod2 Mmod3 Mmod6 Mmod7 Mmod8 Sublit > Pop92 > blacper popgro Foreign noneng MeanInc poverty Goreper oldper age75pl > highturn ; (obs=52) | Overper Mmod1 Mmod2 Mmod3 Mmod6 Mmod7 Mmod8 -------- +--------------------------------------------------------------- Overper| 1.0000 Mmod1| . . Mmod2| 0.7486 0.0000 1.0000 Mmod3| -0.4620 0.0000 -0.3865 1.0000 Mmod6| . . . . . Mmod7| 0.1726 0.0000 -0.1214 -0.1273 0.0000 1.0000 Mmod8| -0.1480 0.0000 -0.3865 -0.4054 0.0000 -0.1273 1.0000 Sublit| 0.5693 0.0000 0.5029 -0.1023 0.0000 -0.0220 - 0.2154 Pop92| -0.1473 0.0000 -0.2854 -0.1530 0.0000 -0.1133 0.5231 blacper| 0.6211 0.0000 0.4158 -0.1840 0.0000 -0.1124 - 0.1350 popgro| -0.3741 0.0000 -0.2682 0.2643 0.0000 -0.1435 0.0669 Foreign| -0.0894 0.0000 -0.2073 -0.0672 0.0000 -0.0524 0.3782 noneng| -0.0285 0.0000 -0.1928 -0.1011 0.0000 0.0942 0.3367 MeanInc| -0.4225 0.0000 -0.4684 0.0607 0.0000 -0.0949 0.4503 poverty| 0.6112 0.0000 0.5217 -0.1503 0.0000 0.0607 - 0.3330 Goreper| -0.1069 0.0000 -0.1256 0.1927 0.0000 -0.2354 0.2675 oldper| -0.2699 0.0000 -0.1095 0.0353 0.0000 -0.1609 0.3610 age75pl| -0.2048 0.0000 -0.0748 -0.0762 0.0000 -0.1653 0.4303 highturn| -0.0383 0.0000 -0.1810 -0.1335 0.0000 -0.0393 0.2247 | Sublit Pop92 blacper popgro Foreign noneng MeanInc -------- +--------------------------------------------------------------- Sublit| 1.0000 Pop92| -0.2985 1.0000 blacper| 0.6113 0.0287 1.0000 popgro| -0.3210 -0.0988 -0.4611 1.0000 Foreign| -0.2043 0.7900 -0.0695 0.0954 1.0000 noneng| -0.1657 0.7075 -0.0685 0.0559 0.9593 1.0000 MeanInc| -0.6135 0.4130 -0.4359 0.3548 0.3583 0.2750 1.0000 poverty| 0.6919 -0.2741 0.6647 -0.5189 -0.1790 -0.1048 - 0.7899 Goreper| 0.0880 0.4860 0.1593 0.1757 0.3822 0.3173 0.2379 oldper| 0.0949 0.0088 -0.4342 0.3883 0.0460 -0.0525 0.3078 age75pl| 0.1355 0.1422 -0.3496 0.1728 0.0941 -0.0123 0.3127 highturn| -0.2978 0.3135 -0.0152 -0.0541 0.3578 0.3507 0.0727 | poverty Goreper oldper age75pl highturn --------+--------------------------------------------- poverty| 1.0000 Goreper| -0.1262 1.0000 oldper| -0.3879 0.3230 1.0000 age75pl| -0.3528 0.3558 0.9455 1.0000 highturn| 0.0725 -0.0510 -0.4375 -0.3820 1.0000 *REGRESSION 1 BASIC. ; . regress Overper Mmod1 Mmod2 Mmod3 Mmod6 Mmod7 Mmod8 Sublit > LPop92 popgro Foreign noneng MeanInc poverty Goreper blacper oldper > age75pl highturn ; Source | SS df MS Number of obs = 52 ---------+------------------------------ F( 16, 35) = 10.90 Model | 377.77566 16 23.6109788 Prob > F = 0.0000 Residual | 75.8034623 35 2.16581321 R-squared = 0.8329 ---------+------------------------------ Adj R-squared = 0.7565 Total | 453.579123 51 8.89370829 Root MSE = 1.4717 ----------------------------------------------------------------------- ------- Overper | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------- +-------------------------------------------------------------------- Mmod1 | (dropped) Mmod2 | 5.107194 .8394107 6.084 0.000 3.4031 6.811289 Mmod3 | .6264834 .822626 0.762 0.451 -1.043536 2.296503 Mmod6 | (dropped) Mmod7 | 5.187284 1.418107 3.658 0.001 2.308372 8.066195 Mmod8 | 2.282271 .9714315 2.349 0.025 .3101603 4.254382 Sublit | .1184544 .1016946 1.165 0.252 -.0879967 .3249054 LPop92 | .0034228 .3762638 0.009 0.993 -.7604335 .767279 popgro | .0104209 .0102347 1.018 0.316 -.0103566 .0311983 Foreign | .0320293 .1760174 0.182 0.857 -.325305 .3893636 noneng | -.0216732 .1344062 -0.161 0.873 -.2945322 .2511858 MeanInc | .1660901 .1269066 1.309 0.199 -.091544 .4237242 poverty | .0812081 .1204599 0.674 0.505 -.1633384 .3257547 Goreper | -.0326626 .0385273 -0.848 0.402 -.1108772 .045552 blacper | .0804912 .0614977 1.309 0.199 -.0443558 .2053381 oldper | -.1252164 .1539209 -0.814 0.421 -.4376925 .1872596 age75pl | .1072159 .3529485 0.304 0.763 -.6093076 .8237393 highturn | -.0590026 2.98322 -0.020 0.984 -6.115261 5.997255 _cons | -4.132968 6.579606 -0.628 0.534 -17.49028 9.224342 ----------------------------------------------------------------------- ------- . regress Underper Mmod1 Mmod2 Mmod3 Mmod6 Mmod7 Mmod8 Sublit > LPop92 popgro Foreign noneng MeanInc poverty Goreper blacper oldper > age75pl highturn ; Source | SS df MS Number of obs = 52 ---------+------------------------------ F( 16, 35) = 4.78 Model | 20.3511387 16 1.27194617 Prob > F = 0.0001 Residual | 9.3128446 35 .266081274 R-squared = 0.6861 ---------+------------------------------ Adj R-squared = 0.5425 Total | 29.6639833 51 .581646732 Root MSE = .51583 ----------------------------------------------------------------------- ------- Underper | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------- +-------------------------------------------------------------------- Mmod1 | (dropped) Mmod2 | -1.02658 .2942193 -3.489 0.001 -1.623877 - .429283 Mmod3 | -.8624329 .2883361 -2.991 0.005 -1.447786 - .2770794 Mmod6 | (dropped) Mmod7 | -.4469883 .4970566 -0.899 0.375 -1.456067 .5620902 Mmod8 | .6372364 .3404935 1.872 0.070 -.0540022 1.328475 Sublit | .0240708 .0356447 0.675 0.504 -.0482918 .0964333 LPop92 | -.2101169 .1318831 -1.593 0.120 -.4778538 .05762 popgro | -.0051483 .0035873 -1.435 0.160 -.0124309 .0021344 Foreign | .0276325 .0616953 0.448 0.657 -.0976157 .1528807 noneng | -.0178882 .0471103 -0.380 0.706 -.1135271 .0777508 MeanInc | .0251369 .0444816 0.565 0.576 -.0651656 .1154394 poverty | .0250832 .042222 0.594 0.556 -.060632 .1107985 Goreper | .0077062 .0135041 0.571 0.572 -.0197086 .0351209 blacper | -.0115184 .0215554 -0.534 0.596 -.0552781 .0322413 oldper | .0496365 .0539504 0.920 0.364 -.0598885 .1591616 age75pl | -.1203142 .1237109 -0.973 0.337 -.3714607 .1308322 highturn | .431748 1.045639 0.413 0.682 -1.691013 2.554509 _cons | .3968967 2.306198 0.172 0.864 -4.284934 5.078727