The Pattern of Spoiled Ballots by County in Florida

"The Pattern of Spoiled Ballots by County in Florida," November 27, 2000. Revised January 27, 2001. (Eric Rasmusen, [email protected])

Preface (January 27, 2001): I've decided to halt work on this paper and not to try to publish it. Richard Posner's forthcoming Supreme Court Review article has a good simple version of the main regressions on spoilage by county in Florida. I hear other scholars are at work on more statistically sophisticated regressions on spoilage by county over the entire United States. Other people may be at work doing more extensive analysis of precinct data in Florida. The present paper falls in between, and therefore it does not seem socially useful for me to continue work on it. Nonetheless, I will leave it up on the web, because it may be that I will have explained the methodology and described the data better than anyone else.

Some things to note about this draft:

Judge Posner discovered that Literacy is an important variable in county-level regressions, and one which may knock out the significance of Black population. I've redone my regressions with his literacy data, and appended the raw Stata output from that at the end of this paper.
A summary of the results when Literacy is included: Literacy and machine type are all that affect spoilage overall, in the regression that includes the most counties. Black percentage is not quite significant. Undervote is affected only by machine type. Overvote is also affected only by machine type-- not by black percentage or literate percentage. My feeling is that all these results are delicate, because of the correlation between Black and Illiteracy, and perhaps the true pattern is different by county also (e.g., in some counties, Black percentage is what is significant; in others, Literacy) .
My advice for anyone doing a study like this: Get county literacy data. Also, be very cautious about using precinct-level analysis. Since literacy data is unavailable, an important variable, highly correlated with black percentage, would be omitted.
Another reason for my ceasing work on this paper is that I realize that we should be much more concerned about fraud than about spoilage, especially when it comes to choice of voting machines. The Miami Herald found 144 unregistered people had voted, in a count of less than 1/4 of Miami-Dade's precincts "Unregistered voters cast ballots in Dade," December 24, 2000, MANNY GARCIA AND TOM DUBOCQ, [email protected]). The Herald also found that in 12 counties, 445 felons had voted, 80 percent of them as registered Democrats ("Hundreds of felons cast votes illegally," December 1, 2000, BY DAVID KIDWELL, PHIL LONG AND GEOFF DOUGHERTY [email protected];"Voter fraud a decidedly nonpartisan offense," Chicago Sun Times, December 07, 2000,p. 45, David Frum.) The St. Petersburg Times reports that in Duval County, officials found 2,659 improper votes cast-- 1 percent, which would yield about 30,000 votes statewide ("Count shows tainted votes,"ADAM SMITH; SYDNEY P. FREEDBERG, Dec. 14, 2000). Other election laws were violated: "At one polling site, as candidate Thompson's wife Karen watched, Gore supporters guarded the entrance of the polling site, telling voters for whom they should vote. Florida election law requires that people campaigning for candidates stay at least 50 feet outside the entrance of the polling place. Mrs. Thompson tried to have the law enforced: "I said, 'you better get them out of there.' But the poll worker inside wouldn't do it. He let them stay there." In late afternoon, after the Gore campaigners pressured hundreds of voters, Miami-Dade police officers arrived and told the campaigners to leave." ("Miami vice," Margaret Menge, WORLD, November 25, 2000,http://www.worldmag.com/world/issue/11-25- 00/cover_3.asp.)
A statement important in thinking about undervotes: "The lines at voting sites were long, and some citizens left before reaching the Votomatics. Under the rules, their pristine cards were collected and placed with ballots that had been punched." The Nation, January 29, 2001 In the Field of Chads by DAVID CORN

1. Introduction

The 2000 Presidential Election brought to public attention a previously ignored feature of voting: spoiled ballots. A large number of voters either voted for no candidate for President (undervoting) or more than one candidate (undervoting), both of which result in no vote being counted. The number was large enough that if no ballots had been spoiled, Gore might well have won the state of Florida, and with it, the Presidency.

Though spoilage had previously been ignored by all but the election officials who had to deal with the practical aspects of vote processing, it was not a new problem, nor one special to Florida. Spoilage rates in previous elections were also large enough to have potentially changed the result, and spoilage rates in states other than Florida were large enough to have changed the results in 2000. Thus, the topic is of more than historical interest.

In this paper, I will analyze spoilage in the 2000 Presidential Election in Florida. I focus on this year, place, and office for three reasons. First, the data and background institutional detail is best, and Florida, with its 67 diverse counties, is large enough to give us insight into spoilage generally. Second, although the topic is not just of historical interest, it is of historical interest nonetheless, and many people are interested in the details of what happened there. Third, though the election dispute has ended, Florida is giving careful thought to whether its election process ought to be changed. This study may be useful for policy purposes in Florida, and for the other states that will be interested in whether any changes Florida makes will really make a difference.

1. Introduction
2. The Pattern of Spoiled Ballots and Support for Bush Versus Gore
3. Other Variables Across Counties
- Exit Polls
4. A Regression to See Which Variables Matter
5. A Regression To Fit the Data Economically
- Outliers
6. Cautions for Interpretation
7. A Precinct-Level Analysis of Overvoting in Palm Beach County
8. Knock-And-Drag and the Black Vote
9. Choice of Voting Machine by County
10. Conclusions
11. References and Data Sources

2. The Pattern of Spoiled Ballots and Support for Bush Versus Gore

In the Florida 2000 Presidential election, about 180,000 ballots were cast but not counted ("about" because recounts made this a shifting total) because of "overvotes" (voting for more than one candidate for president) or "undervotes" (voting for no candidate for president). The Orlando Sentinel survey found that 103,402 of these were "overvotes" where more than one candidate was selected, and 59,845 were "undervotes,'' where no candidate was selected (``179,295 ballots that didn't count,'' Orlando Sentinel . The Sentinel could not get details on another 16,048 of the spoiled ballots.

This paper looks at what might explain spoilage. I start by discussing spoilage generally, with particular attention to the pattern of spoilage in counties favoring Bush compared to those favoring Gore. I then use regression analysis on county data for all of Florida and precinct data for Palm Beach County. Separately, I look at the question of what kind of counties chose to use voting machines associated with high spoilage rates. Finally, I discuss a reason why the percentage of blacks in a county or precinct might be associated with high spoilage rates.

A scatterplot of Percentage of Ballots Spoiled vs. Percentage of Vote for Gore by county shows little correlation. Gore counties did not systematically have more spoiled ballots than Bush counties.

Much of this paper is about Florida counties, so I will insert a map here to aid the reader. At some later point I will make it more useful by using shading to convey some information.

In Palm Beach County, there were 29,702 spoiled ballots, 6.43 percent of the total. That compares to 2.93 percent for Florida overall. Palm Beach County had the most spoiled ballots of any county, but not by a lot. Runners up were Duval County (Jacksonville) with 26,909 (9.23%, in a county Bush won handily) and Miami-Dade County with 28,601 (4.37%, in a county Gore won by a small margin).

Fifteen other counties had a greater percentage of spoiled ballots than Palm Beach did. Of these counties, 13 went for Bush and 2 for Gore. The biggest percentage was 12.40 percent spoiled in Gadsen County. Eight other counties had between 5.00 and 6.43 percent spoiled, of which 2 went for Gore and 6 for Bush. The lowest percentage spoiled was Leon County (Tallahassee), with 0.18 percent spoiled.

There were a total of 179,855 spoiled ballots. If we look at the percentage of the vote for Gore in each county, and allocate that percentage of the spoiled ballots to Gore and Bush's percentage to Bush, that would yield 2,703 extra votes for Gore. That is greater than the 930- vote Bush margin as of 4:39 p.m on November 17, but it is remarkably small compared to the total number of spoiled ballots-- only 1.5 percent.

Allocating spoiled ballots in proportion to candidates' valid ballots by county is one conceivable voting rule, but it is not used in Florida or anywhere else. In the 2000 election, Democrats did suggest that something like it be done in Palm Beach-- that a judge use statistical analysis to estimate how the voters would have voted had they not spoiled their ballots-- but nobody took this seriously. The conventional remedy for spoilage is rather to recount the ballots to make sure which ballots really are spoiled. The recount reveals some ballots to be valid which were thought to be spoiled and other ballots to be spoiled which were thought to be valid. Usually, more ballots are found to be valid, since voting machines rarely read improperly marked ballots but sometimes misread properly marked ballots.

The first step, which Florida law mandated, was to do a machine recount. This means the election officials check that the machines are in order and feed the ballots back into the machines for recounting. This is useful for several reasons. First, machinery inevitably has a certain amount of slack which can cause the same ballots to be read differently by the same machine. It is, of course, unclear that repeated recounts will ever converge on a uniform result if this is the only source of misreading. Second, punchcard machines, and particularly the Votomatic type which uses pre-perforated ballots that the voter punches with a stylus, can leave "hanging chads"--- holes partly punched. These chads can be brushed away in a recount. Most people think that is a valid occurrence, but the other side of the coin is that ballots degrade with repeated counting, even if no fraud is involved, and ballots that were originally valid may become unreadable because of wear. Third, human error may have distorted the first count. Some ballots may not have been put through the machines, or some might have been put through twice, for example. Such error is more likely in the original count than in a recount because less attention would have been paid when it was not known that the election would be so close. The data used in this paper is all from after the machine recount.

The second step, the source of most of the post-election litigation, is to do manual recounts. This involves people looking at each ballot to see if the machines read it properly. This paper is not about the intricacies of how this might be done, which no doubt will be discussed at length in law review articles. It is worth noting here, however, that one would not expect manual recounts to produce a large change in election results, in Florida 2000 or elsewhere. Palm Beach County had xxx spoiled votes, and a manual recount by partisan Democrats produced only xxx extra votes for Gore, a total which does not even subtract any challenges to the Democratic count the Bush campaign might have successfully made in court had the process not been halted.

If this Palm Beach figure of xxx percnet is applied to the 2,703 extra votes for Gore, the number drops to xxx. So Gore would still not have won.

The relevance of that to the current article is that the number of spoiled ballots is truly large. The problem is not that voting machines are unreliable, but that voters either choose to spoil their ballots or make large number of mistakes.

3. Other Variables Across Counties

If spoiled ballots are concentrated neither in Gore counties nor Bush counties, what pattern might there be? This section looks at a number of variables and the next section will use them in linear regressions.

The data I use is from a number of sources. The percentages of spoiled ballots and total ballots cast are from the Orlando Sentinel, http://orlandosentinel.com/elections/lost.htm. The Gore 2000 and total 2000 votes are from Adams and Fastnow, "A Note on the Voting Irregularities in Palm Beach, Florida." You can also get the Florida state data by county directly from the Florida Division of Elections, which is where I found the data on turnout in the 1998 senatorial election. Most of the variables are from a University of Virginia website that has the U.S. 1994 county data arranged nicely,< A HREF= "http://fisher.lib.virginia.edu/ccdb/county94.html "> Fisher.lib.virginia.edu/ccdb/county94.html.

I have the data available in several formats:

In PDF from Latex, I have Table 1, which has the most important variables, and Table 2, which has two important variables and others that turn out to be less important. Both of these tables set out the data nicely.
In XLS Excel spreadsheet format, I have one spreadsheet with all the data.
In PRN ascii format, I have one file with all the data.

Here are the variables I used. Each is measured by county.

Spoilage. This is the number of spoiled ballots divided by the total valid and invalid presidential vote in the 2000 election.
Overvotes. This is the number of ballots spoiled by overvoting divided by the total valid and invalid presidential vote in the 2000 election.
Undervotes. This is the number of ballots spoiled by undervoting divided by the total valid and invalid presidential vote in the 2000 election.
Mean_Income. Mean income in the county, in thousands of dollars. I include it not because I thought it mattered as a voter characteristic (see "poverty" below) but as a county government characteristic-- perhaps poor counties explained ballots more poorly, etc.
Population. County population in 1992, in thousands. If there are economies of scale in county government, maybe smaller counties had worse balloting procedures and more spoiled ballots. The logarithm of this variable is used in the regressions, as explained below.
Population_Growth. County population growth from 1980 to 1992. A growing county might not have adjusted well, with consequent confusion at polling places and more spoiled ballots.
Foreign. Foreign-born population as a percentage of total population in 1990. First-time voters might spoil more ballots.
Non-English-Speakers. Percent of population aged 5 or more who spoke a language other than English at home in 1990. Such people might spoil more ballots because their English is poorer.
Poverty. Percent of families with incomes below the poverty line in 1989. Poor people might be worse at following directions.
Gore's_Percentage. Percentage of the 2000 vote that went to Al Gore. If either Democrats or Republicans have a special tendency to spoil ballots, this would show up significant.
Black. Black population as a percentage of total population in 1990. Black turnout was said to be a special Gore effort, and may have included many confused and first-time voters.
Over65. Percentage of the population aged 65 or over in 1990. The leading newspaper theory of the Palm Beach County spoiled ballots was that they were cast by retired people whose acuity had diminished.
Over75. Percentage of the population aged 75 or over in 1990. This would include a higher percentage of senile people, and might well be more important than the percentage of people aged 65-74.
Turnout. Total presidential vote in a county in 2000 divided by the total vote in the 1998 race of U.S. Senator. It is designed to measure the strength of turnout in 2000 relative to previous years. If turnout was high in a county, then more inexperienced people were voting, who would spoil more ballots.
Optical_Central_ESS. This takes a value of 1 if the county used ESS 115 or 315 optical ballots counted centrally and 0 otherwise. These optical ballots were marked by hand and then scanned through computers centrally. 16 counties used these.
Optical_Precinct_Accuvote. This takes a value of 1 if the county used Accuvote optical ballots counted at the precinct level and 0 otherwise. 16 counties used these.
Optical_Precinct_ESS. This takes a value of 1 if the county used ESS III-P or IV-C optical ballots counted at the precinct level and 0 otherwise. 8 counties used these.
Punchcard_Votomatic. This takes a value of 1 if the county used Votomatic punchcards and 0 otherwise. 15 counties used these.
Punchcard_Datavote. This takes a value of 1 if the county used Datavote punchcards and 0 otherwise. 10 counties used these.

In addition to these methods, one county (Union) used paper ballots and one county (Martin) used lever machines. They are dummied by the variables Paper and Lever.

My data sources are in conflict on two counties. The Sentinel says that Baker County used punchcards and Martin County a lever machine, where the Florida Division of Elections webpage says they used the ESSIVC optical precinct and Datavote punchcards. An email from Baker County tells me they used optical ballots, and other sources say that Martin used a lever machine. (xxx add other sources)

Tables 3 and 4 show the summary statistics and correlation matrices. All statistics in this paper are rounded downwards.

 
           TABLE 3: SUMMARY STATISTICS FOR THE COUNTY DATA (n=67, 54
for
Undervote and Overvote)


Variable             |           Mean   Median    Min         Max
---------+---------------------------------------------------------
Spoilage             |           3.88    3.34    0.17        12.40   
Undervote            |           0.92    0.72    0.00         3.49   
Overvote             |           2.69    1.14    0.00        11.60   
Mean_Income          |          12.453  12.112   8.527       21.385    
Population           |         201.234  80.123   5.745    2,007.972  
Population_Growth    |          45.04   37      -1          208   
Foreign              |           4.96    3.6     0.6         45.1  
Non-English-Speakers |           8.20    6.4     1.9         57.4    
Poverty              |          11.73     11     5           25   
Gore's_Percentage    |          40.74     24    40           66  
Black Percentage     |          13.35     11.94  1.92        56.11   
Aged_Over_65         |          17.48     15.7   7.5         33.8   
Aged_Over_75         |           6.81      6.4   2.7         14.2   
High_Turnout         |           1.54      1.52  1.32         1.87 

 


  
TABLE 4: THE CORRELATION MATRIX FOR THE COUNTY DATA (n=67,variable
names
abbreviated)


        | Spoilage  Op-Cent  Op-Accu   Pun-Dat  Pun-Vot   Pop 
+---------------------------------------------------------------
Spoilage|   1.00
Op-Cent |   .54     1.00
Op-Accu |  -.56     -.31      1.00
Pun-Dat |   .38     -.22      1.00
Pun-Vot |  -.05     -.30      -.30      -.21     1.00      
Pop     |  -.16     -.23      -.07      -.21      .55     1.00
Black   |   .58      .27      -.15       .14     -.15     -.01
Popgrow |  -.41     -.20       .27      -.14      .10     -.05
Foreign |  -.10     -.15       .00      -.16      .41      .79
Noneng  |  -.02     -.14      -.04      -.07      .36      .70 
MeanInc |  -.46     -.36       .13      -.27      .44      .42
Poverty |   .61      .41      -.21       .24     -.35     -.31
Gore    |  -.12     -.10       .27      -.12      .26      .46
Over65  |  -.23     -.08       .05      -.14      .38      .06
Over75  |  -.17     -.05      -.03      -.15      .44      .18
Turnout |   .00     -.12      -.11      -.08      .21      .28                         


        |  Black    Popgro   Foreign   Noneng  MeanInc  Poverty  Gore
+--------------------------------------------------------------------
Black   |  1.00
Popgro  |  -.48     1.00
Foreign |  -.11      .12      1.00
Noneng  |  -.10      .08       .95     1.00
MeanInc |  -.39      .37       .38      .30    1.00
Poverty |   .59     -.51      -.24     -.18    -.80     1.00
Gore    |   .23      .15       .35      .29     .27     -.18    1.00
Over65  |  -.40      .39       .12      .03     .36     -.41     .30
Over75  |  -.31      .19       .16      .06     .37     -.37     .34
Turnout |  -.15     -.01       .31      .32     .04     -.02    -.14                                                 


        | Over65   Over75    Turnout
--------+---------------------------
Over65  |   1.00
Over75  |    .95     1.00
Turnout |   -.32     -.31      1.00

The correlation matrix reveals that there is enough correlation between these variables that regression analysis is necessary to untangle their effects. For example, Non-English-Speakers and Population have a correlation of 0.70-- big counties have a bigger percentage of people who do not speak English at home-- so care must be taken to separate out the two effects. Non-English-Speakers has a very small correlation of -0.03 with the percentage of spoiled ballots, but could that be because the non-English speakers are in big counties, whose efficiency in conducting elections cancels out the difficulties that non- English speakers have? Regression analysis is a procedure suited to solving exactly this kind of problem.

Exit Polls

Exit polls ask individuals leaving polling places how they voted, thus avoiding the ecological fallacy, though incurring the risk of false answers. At any rate, they provide useful information helpful in interpreting the averages in counties or precincts. Here are results from an exit poll, available from CNN or for the 2000 Presidential election in Florida. Note that since the minor candidates picked up from 0 to 5 percent of the vote, if Gore's percentage is over 48 he may be even with or ahead of Bush.


TABLE 5: EXIT POLLS  

Type of Person   This Type/All_Voters
Gore_Votes_of_this_Type/This_Type

Men                     46%                             42%           
Women                   54%                             53%    

White                   73%                             40%     
African-American        15%                             93%                
Hispanic                11%                             48%       
Asian                    1%                             --   
Other                    1%                             --     

18-64                   80%                             49%     
65 and Older            20%                             46%           
                                                     
Income under $15,000     8%                             62%                  
$15,000-$30,000         17%                             60%              
$30,000-$50,000         26%                             48%              
$50,000-$75,000         23%                             45%              
$75,000-$100,000        12%                             40%                
Over $100,000           14%                             33%             

Married                 65%                             43%       
Not Married             35%                             58%          

Democrat                40%                             86%        
Republican              38%                              8%         
Independent             22%                             47%

What is noteworthy for the purposes of this paper is that Gore did better with blacks (93% to 7%) and poor people (62% to 32%), the two candidates were even with Hispanics (48% to 49%), and Gore did worse with old people (46% to 52%).

4. A Regression to See Which Variables Matter

The dependent variable is the percentage of a county's ballots that were spoiled. I used the Stata package for this and the other statistical analysis. The baseline machine type is the Optical_Precinct_ESS , so all the other machines type effects are measured as deviations from what would have happened using that machine.

R-Squared = .78. N=67. The coefficient is followed by the t- statistic. Variables significant at the 10 percent level are starred.

 
 
SPOILAGE:                     Coefficient     t-statistic

Constant                  |      -3.49          0.74   
    
Lever                     |       -.16          0.09
Optical-Central           |       4.02*         5.63*         
Optical-Precinct-Accuvote |       -.31          0.42      
Paper                     |       2.74          1.61       
Punchcard-Datavote        |       3.95*         4.67* 
Punchcard-Votomatic       |       2.23*         2.76*    

Log(Pop)                  |       -.25          0.95    
Pop Growth                |        .00          0.19     
Foreign                   |       -.10          0.84     
Noneng                    |        .10          1.16 
MeanInc                   |        .10          0.94   
Poverty                   |        .08          0.93     
Gore                      |       -.04          1.32   
Black                     |        .12*         4.05*     
Over65                    |        .07          0.56     
Over75                    |       -.09          0.30     
Turnout                   |       2.21          1.05

What we see from this regression is that most of the variables are statistically insignificant. Only the type of voting machine and the percentage of black population in a county are related to spoilage. The regression's main lesson, however, is that spoilage is unrelated to:

1. The percentage of elderly people.
2. The percentage of immigrants.
3. Wealth or the percentage of poor people.
4. The rate of population growth.

I also tried running the regression with just one variable for the elderly, Over75, and the results changed very little, so the insignificance of the elderly is not due to their influence being split between two variables. And I tried dropping Dade County, an outlier, and found there was little change.

A look at the Table 1 shows why the regressions turn out this way. The six counties with the greatest percentage of elderly are Charlotte (34%), Citrus (32%), Hernando (31%), Highlands (34%), Pasco (32%), and Sarasota (32%). Their spoiled ballot percentages were 4.79%, 0.38%, 0.43%, 2.89%, 2.74%, and 1.74%, mostly below average and none of them in the top quartile. Counties with lots of old people just don't have more spoilage.

. Broward (21%), Dade (14%), Palm Beach (25%), and Volusia ( 23%) are not notably high in their percentages of old people. The ``Old Lady in Palm Beach'' jokes that sprang up in November 2000 were based on misimpressions (though see the precinct-level results below).

The regression above combines overvotes and undervotes in its definition of spoilage. CNN and the Associated Press collected data breaking spoilage down for 54 counties. (The total spoilage is not the same as in the Orlando Sentinel data used above, but seems to have no systematic difference, and the discrepancy is perhaps due to the dates on which the data was collected, since vote totals shifted during the month after the election.) David Rusin cleaned this up somewhat, fixing some switched entries, and I use his data here. It turns out that undervotes and overvotes are quite different. As Table 6 shows, there is virtually no correlation between the percentage of each by county.

 TABLE 6: CORRELATIONS BETWEEN SPOILAGE TYPES

        |  Spoilage   Under    Over    
--------+----------------------------- 
Spoilage|   1.00 
Under   |   0.18      1.00 
Over    |   0.97     -0.06     1.00

Repeating the earlier regression for undervotes and overvotes separately on the 54 counties (dropping the Lever and Paper variables, since no county in the reduced dataset uses them) yields the following.

N=54. The coefficient is followed by the t-statistic and the simple correlation. Variables significant at the 10 percent level are starred.

 
 
OVERVOTES:                  Coefficient   t-statistic     Correlation  
(R-Squared=.83)

Constant                  | -2.08          0.37             ----   
                               
Optical-Central           |  5.08*         6.27*            .76
Optical-Precinct-Accuvote |   .75          0.95            -.47
Punchcard-Datavote        |  5.09*         3.69*            .15
Punchcard-Votomatic       |  2.11*         2.35*           -.17
                           
Log(Pop)                  |  -.10          0.34            -.45                
PopGrowth                 |   .01          1.08            -.37
Foreign                   |  -.03         -0.26            -.10          
Noneng                    |   .04          0.38            -.04        
MeanInc                   |   .12          1.01            -.44  
Poverty                   |   .09          0.80             .62 
Gore                      |  -.03          1.04            -.13       
Black                     |   .12*         3.11*            .61
Over65                    |  -.11          0.78            -.30       
Over75                    |   .22          0.70            -.23       
Turnout                   |  -.01          0.00            -.04

 
UNDERVOTES:                 Coefficient    t-statistic     Correlation  
(R-Squared=.67) 
                          
Constant                  |  -.31             0.16            ---
                                
Optical-Central           | -1.04*            3.63*           -.28
Optical-Precinct-Accuvote |  -.84*            2.98*           -.42
Punchcard-Datavote        |  -.52             1.06             .04
Punchcard-Votomatic       |   .49             1.54             .19
                       
Log(Pop)                  |  -.22*            1.99*            .05                    
PopGrowth                 |  -.006*           1.73*           -.15                   
Foreign                   |  -.01             0.36             .14                   
Noneng                    |   .01             0.46             .15                  
MeanInc                   |   .01             0.33             .10                  
Poverty                   |   .01             0.29             .03                 
Gore                      |   .00             0.49             .00               
Black                     |   .00             0.42             .00
Over65                    |   .07             1.38             .13                       
Over75                    |  -.11             0.96             .17                       
Turnout                   |  1.10             1.20             .30

xxxx discussion here.

5. Regressions To Fit the Data Economically

The variables that were significant in the first overvote regression provide almost all the explanatory power, since a reduced regression including only the voting machine variables, the Black percentage, and a constant has an R-squared of .81, barely down from the .83 of the original regression. I tried, but do not show here, a regression with just the machine types, and not county size or black percentage, and the R- Squared fell to .69, so the machine type is providing the bulk of the explanatory power, though by no means all of it.

R-Squared = .81. N=54. Variables significant at the 10 percent level are starred.

   
OVERVOTES:                 Coefficient   t-statistic

Constant                  |  -1.07*         1.74*    
						  |
Optical-Central           |  -4.99*         7.36*         
Optical-Precinct-Accuvote |    .28          0.42         
Punchcard-Datavote        |   5.28*         4.70* 
Punchcard-Votomatic       |   1.65*         2.48*    
                          |
Black                     |    .12*         5.50*

This regression says that a county which had 15 percent black inhabitants and used the ESS IV-C optical ballots counted at the precinct level (the missing machine dummy) would have .73 percent (=-1.07 + .12 (15)) overvotes. A county which used Votomatic punchcards instead would have 1.65 percent more overvotes. A county with 20 percent more black inhabitants would have 2.40 percent more overvotes. For any of these, to get the spoilage total, undervotes would have to be added, an average rate of 0.92 percent across counties.

How about undervotes? Retaining only the significant variables, we get the following. R-squared =0.63. N=54.

   
UNDERVOTES:                 Coefficient   t-statistic

Constant                  |  2.35*         6.80*    
						  |
Optical-Central           |  -.90*         3.67*         
Optical-Precinct-Accuvote |  -.65*         2.73*         
Punchcard-Datavote        |  -.43          1.06 
Punchcard-Votomatic       |   .83*         3.35*    
                          |
Log(Population)           |  -.24*         3.99* 
Population Growth         |  -.0035*       1.87*

xxx here put exaplnation of this. Outliers

It is interesting to look at which counties are outliers, most poorly explained by the regression equation. I'll use the original spoilage regression for this. A bigger residual--either positive or negative-- indicates a bigger unexplained percentage of spoiled ballots. Tables 1 and 2 both list the residuals for the individual counties. Let us look at the eight counties with residual greater than 1.70 percent. The counties with unexpectedly high spoilage are Duval (4.20%), Palm Beach (3.46%), Okeechobee (2.35%), and Hendry (1.95%), and DeSoto (1.83%). The counties with unexpectedly low spoilage are Polk (-4.27%) , Madison (-2.32%), and Sumter (1.72%).

Duval County, for example, has a residual of 4.20%, indicating that its spoiled ballot percentage of 9.22% is 4.20% more than the regression equation would predict based on its use of punchcards and its 23 percent black population.

Note that regression outliers are not necessarily the same as outliers in terms of the dependent variable, percentage of spoiled ballots. Gadsden County has the most spoiled ballots, 12.40 percent, but its residual of 1.56 percent is only moderately high; the regression equation does explains its high rate of ballot spoilage. Okeechobee County has typical values for all of the variables except its spoilage rate of 8.00 percent, but it is a regression outlier, with a residual of 2.35 percent, so the variables in the regression equation do not explain its high spoilage rate very well.

Only one of the four counties that did manual recounts is an outlier under the 2.00 percent criterion. Broward, Miami-Dade, and Volusia Counties have residuals of -0.14, -1.18, and 0.43 percent, all reasonably close to the regression's prediction. Palm Beach County has a residual of 3.46 percent, so its spoilage rate is surprisingly high given its low black population (12 percent), even though its spoilage rate of 6.42 percent was not extreme.

6. Cautions for Interpretation

The Ecological Fallacy

This analysis has been an exercise in data description rather than a test of a formal model. One must be careful in interpretation. It is easy to fall into the "ecological fallacy" (See the King and Robinson references at the end of this paper). Suppose I was trying to explain Ku Klux Klan membership by county, and I ran a regression. I might well find that Klan membership was strongly associated with percentage of the population that was black in the county. The ecological fallacy would be to jump to the conclusion that blacks have higher-than-average tendencies to join the Klan. In the present context, I have found that counties with more black and poor people tend to have more spoiled ballots. This might be because blacks and poor people spoil more ballots when they vote, but not necessarily. It might also be, for example, that counties with more blacks and poor people choose to have more confusing ballots for some devious political reason and that everybody, rich, poor, black and white spoils more ballots as a result. But these regression results do tell us what the patterns are across counties, if not why those patterns exist.

The pitfall arises in the difference between optical ballots counted centrally and those counted in the precincts. It could be that a county would reduce its spoiled ballot percentage by 4.90 if it stopped counting centrally and counted at the precincts instead, if the precinct system is one where the vote is registered at the booth itself, and notifies the voter of a misvote. But something else may be going on in the kinds of counties that use each system.

The Observed Choice Problem: Choice of Voting Machines by Counties

It would be interesting to know what variables explain which counties use which voting system. Do counties with more old people generally use optical-precinct machines, for example, so that the number of elderly does have an indirect effect on the number of spoiled ballots? The correlation matrix above gives some insight into this; there are no strong correlations except perhaps a tendency for centralized optical ballots to be used in counties with more poor people.

Choice of voting system is important to interpreting even the results here, however. As I note in my 1998 Public Choice paper on the ``Observed Choice'' problem, particular policies are likely to be chosen in the districts where they work the best, and this biases regression trying to estimate the average effect of policies. In the present context, optical ballots scanned at the precinct level seem to reduce spoilage, and one might suggest that other counties wanting to reduce spoilage adopt them. It might be, however, that the counties know the effects of voting machines better than I as analyst know them, and they know that in some counties the special machines reduce spoilage but in others they do not, so the counties where they would work have adopted them, explaining why we see heterogeneity in machines across counties.

Another peril of observed choice is that districts may adopt policies which solve problems, which therefore do not show up in regressions. Suppose,for example, that old people have special trouble with punchcards. Counties with many old people would then tend not to use punchcards, so in my regressions, an otherwise existing effect of old people having more spoilage would not show up. If all counties were forced to use punchcards, then it would be clear that old people have trouble with them. Potentially, that could explain why my precinct results below for Palm Beach County are different. As we will see in a later section, however, this turns out not to be the case.

Econometric Notes:

Heteroskedasticity

Heteroskedasticity does not seem to be a problem, since eyeballing a scatterplot of Residuals vs. Population by county shows no clear correlation. I had feared that small counties might have disturbances with greater variance, since a given change in the number of spoiled ballots would have a bigger impact than in the big counties.

Dependent Variables Bounded by 0 and 100 Percent

I was concerned that since the percentage of spoiled ballots is bounded below by 0 and above by 100, tobit estimation might be appropriate, except that it is reliable only for large sample sizes (and N=67 is not very large). I tried running tobit, however, and it indicated that few of the predicted values were censored at 0 or 100. Thus, it seems OLS works well.

FUNCTIONAL FORM

I have just reported what is almost the simplest regression I ran: linear, except for using the logarithm of county population. After I ran the purely linear specification for the first posted draft, I started trying other specifications in response to comments I received. Instead of simply Percentage of Spoiled Ballots (call it "P"), one could use log(P) or the log odds ratio, log(P/100-P)) on the left-hand side. (The log odds ratio has the nice property that it can range from negative to positive infinity as P goes from 0 to 100, so the censoring problem in the previous paragraph disappears.) Instead of, for example, Pop92 on the right-hand-side, one could use log(Pop92), which assumes a decreasing effect of increases in county size rather than a constant effect of such increases. Instead of including all the counties, one could (a) drop or (b) add dummy variables for the counties with unusual voting systems and for counties that are outliers in the right- hand-side variables. Dade County worries me, for example, because it is so extreme in population (2,008, compared to the next-highest county's 1,301 and the mean of 201), percentage foreign- born (compared to the next-highest county's 16 and the mean of 5), and percentage not speaking English at home (compared to the next-highest county's 24 and the mean of 8).

After trying some of this, I retreated to a simple specification. This paper's aim is data analysis rather than testing a formal theory. I do not have a particular model of voter behavior I am trying to test; rather, I am looking for patterns in the data that would suggest what might be happening. This has a number of implications:

First, I have no model to tell me which specification is best-- whether the variables are linked linearly or logarithmically.

Second, I want to keep my regressions simple enough that I can get a feel for what is going on. In my situation, I cannot set up a complicated regression to fit a formal model, run it through the computer, and accept the results of a test that tells me to accept or reject my model. Often that is appropriate, but not here. Rather, I am essentially looking for the correlation between Spoiled Ballots and variable X conditional on variables W, Y, and Z being held constant. This is just one step up from looking at the correlation matrix, and one step is enough. If I start doing log transformations, I can't understand the regression any more. In particular, it becomes difficult to interpret coefficient size when the left-hand-side is, for example, an odds ratio.

Third, since I am not testing a formal model, I am not concerned about the validity of hypothesis tests. Statistical correctness requires that assumptions such as an unlimited range for the left-hand-side variables be satisfied, that if the significance of several variables is tested that an F-test be used rather than multiple t-tests, and that there be sufficient data to justify large-sample techniques such as tobit. If I were doing all this to test the single hypothesis that Gore's percentage of the vote explained the amount of Spoiled Ballots conditional upon several other variables being held constant, then I might want to be more formal. But I am interested in looking for patterns more generally. That means when I say a t-statistic indicates lack of significance I am not really using formal statistical theory; I am saying the coefficient size is small relative to the standard error and so there is no clear indication that the true coefficient is different from zero. For this modest purpose, a linear specification and multiple t-tests are appropriate. What I need to be most careful about, given my lack of a theory, is getting the list of explanatory variables correct. As an illustration, in my first run I had notyet typed in the voting machine variables, which turned out later to be the most important ones. My initial results were misleading about the importance of other variables that happened to be correlated with machine type and picked up its effect.

I did, however, decide to use a regression with Log(Population) rather than Population as an explanatory variable. Dade County has 335 times the population of Liberty County, and I just couldn't believe that if being bigger counties had less spoilage the spoilage effect would be 335 times bigger in Liberty County. Using Log(Population) reduced the ratio between Dade and Liberty to a more reasonable 4.24. And in the regression, county size now became statistically significant. All the other variables except Mean Income are in percentages, which are more likely to have a linear effect, and mean income only varies from 9 to 20.

7. Precinct-Level Analysis for Dade and Palm Beach Counties

Each county is divided into precincts for convenience in voting. These are administrative, not political units, and can be changed by the election officials depending on their convenience and that of the voters. When people register to vote in Florida, they are asked to provide information about themselves, and for some counties this is available at the precinct level. Bruce Hansen has organized a website at xxx that posts any Florida 2000 precinct-level data called to his attention.

Unfortunately, this data, and especially the demographic registration data and data separated into overvote and undervote, as opposed to simple voting results, is available only for a few counties, and those not a representative sample. Nonetheless, it will be useful in this section to look at two important counties for which data is available by precinct on undervote, overvote, and demographics, Miami-Dade and Palm Beach. Data is available exactly because these counties are so special: they are large, they are urban, their spoilage was heavily disputed, and they are Democratic. But let us see what the data says.

A map of ballot spoilage rates by precinct in Palm Beach, Dade, and Brouward Counties is up on the web at Sun- sentinel.com/shopping/map/sfprecincts.htm.

"Gore"is the percentage of the vote for Gore. "Vote" is the total number of votes in the precinct. "Dem", "Rep, " and "Other " are the percentages of registered voters of each party. Note the high percentage of "Other." Note also that in Palm Beach County, looking at the simple correlations Gore did not do better in more hispanic or non- Republican- or- Democratic precincts, though he did well in precincts with more old people, blacks, women, and Democrats. Care must be taken with the simple correlations, though-- I would not be surprised if Gore's apparent strength with women was really strength in precincts with old people, who are disproportionately female.

I have excluded absentee precincts, since I do not have demographic data for them, and the xxx precincts in Miami-Dade and 38 in Palm Beach with less than 25 ballots cast, since those are peculiar precincts and the percentage differences there are more variable. I also dropped any precinct with a turnout measured to be over 150%, as likely to be in error (xx in Miami-Dade and 1 in Palm Beach).


TABLE 7:   SUMMARY STATISTICS FOR THE PRECINCT DATA  

             MIAMI-DADE (N=564)        PALM BEACH (N=492)        
Variable |  Mean  Std.Dev.  Min-Max   |  Mean    Std.Dev.  Min-Max   
---------+-----------------------------------------------------
Gore     |   55.70  24.41  13.51-100  |  58.20    13.71    7.46-90.37  
         |            				  |
Overvote |    2.93   2.26  0-10.97    |   4.98     3.51    0-34.48 
Undervot |    1.56    .95  0-7.27     |   1.82     1.82    0-13.86  
         |            				  |
Turnout  |   65.87  20.67  1.50-145.43|  62.00     9.70   14.14-84
Ballots  |  993.27 539.23  27-3007    | 843.60   507.78   29-2434  
         |            				  |
TotalReg | 1553    767     42-4451    |1337      752      41-2939
Democrat |   45.52  21.15  13.52-92.15|  45.54    16.53   4.12-89.16934 
Repub.   |   36.53  19.09  1.32-69.52 |  34.97    14.03   3.8 -84.02 
Black    |   19.99  30.18  0-97.20    |  10.01    20.98   0-95.51   
Hispanic |   41.48  27.06  .08-87.60  |   3.47     4.00   0-27.22   
Female   |   55.24   4.01  39.90-69.06|  54.31     4.22   39.09-73.76 
Age65up  |   23.57  12.42  4.46-76.84 |  33.94    25.47   1.48-96.73


TABLE 8:  A  CORRELATION MATRIX FOR THE  PRECINCT DATA:
MIAMI_DADE (N=564) (PALM_BEACH underneath, N=492) 

        |  Gore    Overvote Undervot Turnout Votes Democrat Black
Hispanic
--------
+-----------------------------------------------------------------
Gore    |  1.00  
        | (1.00)
        |                                                                          
Overvote|   .56    1.00  
        |   (.42)  (1.00)
        |                                                                  
Undervot|   .32     .53    1.00  
        |   (.04)   (.11)  (1.00)                                                        
        |
Turnout |  -.14    -.12    -.12      1.00  
        |   (.07)   (-.31)  (.00)   (1.00)                                 
        |                                                                  
Ballots |  -.22    -.11    -.12      .41     1.00 
        |  (.09)   (-.21)   (.33)   (.38)   (1.00)                                             
        |                                                                  
Democrat|   .88     .60     .37      -.11   -.24    1.00 
        |  (.83)   (.72)   (.13)    (-.07) (-.05)  (1.00)                                      
        |                                                                  
Black   |   .71     .71     .43      -.11   -.11     .88    1.00 
        |  (.38)   (.65)   (.03)    (-.38) (-.19)   (.60) (1.00)                                 
        |                                                                  
Hispanic|  -.80    -.29    -.18       .08    .31    -.86    -.64
1.00
        |  (-.09)  (.14)  (-.04)    (-.34) (-.19)  (-.01)   (.06)
(1.00)
        |                                                                     
Over65  |  -.35    -.10     .04       .06    .02    -.34    -.38
.42
        |  (.29)   (.14)   (.21)     (.30)   (.11)  (.29)  (-.30)   (-
.43)

Here are the regression results:


PRECINCT REGRESSIONS   FOR MIAMI-DADE (N=564) AND  PALM BEACH (N=492) 
   				(t-statistics in parentheses)
 
                       
          			   OVERVOTE                    UNDERVOTE
 			  MIAMI-DADE     PALM BEACH    MIAMI-DADE     PALM BEACH  
            (R-sq=.66)        (R-sq=.75)    (R-sq=.26)     (R-sq=.24)                                                  
                
Constant  | -5.278 (2.97)*  23.24 (7.65)*    .430(0.39)   7.072 (2.55)*

Gore      |   .046 (8.72)*   -.202 (11.69)*  .007 (2.25)* -.070(4.46)*  
Turnout   |  -.001 (0.33)    -.008 (0.81)   -.001 (0.77)  -.020 (2.16)*        
Ballots   |  -.00041(3.31)*  -.00030(1.74)* -.00016(2.06)* .00151(9.48)
*
Democrat  |   .037 (1.58)    -.025 (0.97)   -.002 (0.15)   .020 (0.85)  
Republican|   .020 (1.18)    -.285 (8.01)*  -.002 (0.22)  -.077 (2.38)*

Black     |   .048 (7.34)*    .078(10.51)*   .016 (4.07)*  .012 (1.84)*   
Hispanic  |   .055 (9.54)*    .075 (2.86)*   .008 (2.40)*  .013 (0.57)         
Female    |  -.002 (0.12)     .053 (2.15)*   .001 (0.13)   .029 (1.32)          
Over65    |   .016 (2.48)*    .045 (7.83)*   .016 (3.95)*  .021 (4.09)*

First, consider overvotes. What the two counties have in common is that small precincts (low "Ballots") and those with large proportions of Blacks, Hispanics, and old people have more overvotes, whereas turnout and the percentage of Democrats makes no difference. At the same time, the two counties clearly do not follow the same patterns. Variables are significant in one regression but not the other (Female); significant in both but with very different coefficient sizes (Over65), and significant in both but with opposite signs (Gore). Undervotes show a similar mix of results. What the two counties have in common is that small precincts and those with large proportions of Blacks and old people have more overvotes, whereas turnout and the percentages of Democrats or women makes no difference. Recall that at the county level, xxx It coudl be that there is a critical threshold effect for spoilage. I can test for that. The size of the coefficients has meaning. Increasing the percentages of Democrats, Others (not Republican or Democrat), and Hispanics had the strongest effects, with coefficients of .19, .15, and .12. Increasing the percentages of blacks (.08) or old people (.04) had distinct but smaller effects.

How can this be reconciled with the county-level regressions, where the percentages of non-English-speakers and old people had no effect on spoilage? There are a number of possibilities.

1. What the county regressions found was that voting machine type was the most important variable. The same voting machines were used in all the Palm Beach precincts, so the biggest reason for variation in spoilage was absent. So too was another significant reason: county size. This allowed other reasons to show through more clearly.
2. The county regressions only had 67 counties, whereas the precinct regressions had 494 precincts. Thus, less clear effects could show up in the precinct regressions.
3. The county and precinct regressions are measuring different things: what kinds of counties in Florida and what kinds of precincts in Palm Beach County have more spoilage in general or overvoting.
- Overvoting may be different; it includes less purposeful abstention, for one thing.
- Palm Beach County may be different from the rest of Florida; blacks and hispanics there are different kinds of people from blacks and hispanics in the Florida Panhandle, for example. Press reports suggest that many Jewish old people from New York retire to Palm Beach County; Sarasota old people may be different. Local circumstances might give rise to the results: to give a hypothetical: Palm Beach County might have had a hispanic radio station that gave bad directions on how to vote, and the Hispanic variable picks that up.
- Trying to compare the two regressions, we may fall into the ecological fallacy. Just as individual blacks may join the KKK at different rates than counties with large black populations, so precincts with many blacks might behave differently from counties.

At any rate, my own conclusion is that what the precinct regressions show is that though voting machine type may be the most important determinant of ballot spoilage, demographics also matter, in ways that vary across counties and hence are not clear at the county level. In addition, I would not be surprised if in Palm Beach County, but not other counties, old people turned out in unusually great numbers, so that their effect on spoilage was like that of blacks.

8. ``Knock-And-Drag'' and the Black Vote

Black turnout was very heavy in Florida. This was not accidental, but was a deliberate effort by the Gore campaign. Blacks are a group that is so heavily Democratic that turnout is all-important. This was true generally; in the 2000 election, despite the bumps in the polls,the polls told us that self-denoted Democrats supported Gore all along and Republican supported Bush, so the parties rationally devoted much effort to trying to increase turnout of the party faithful. But Democrats did not support Gore as much as blacks did (86% compared to 93%, from the CNN poll cited above) . Moreover, it is cheaper to hire black election-day knock-and- drag campaign workers for black neighborhoods than corporate-executive knock-and-drag campaign workers for corporate-executive neighborhoods. I also recall hearing that at the last minute the Gore campaign shifted all its Georgia get-out- the-vote money to Florida, having given up Georgia.

Getting out the vote is a longstanding and honorable campaign tactic unless bribery is used, which has not been alleged in Florida. Paying people to telephone voters on Election day and go door to door is fully legitimate. The excerpts below from a New Republic article on the 2000 New Jersey campaign give a feel for how it is done and how important it was to the Gore campaign.

"The following year, applying the turnout techniques of the Torricelli campaign, Democrat Jim McGreevey came from nowhere to within 26,000 votes of unseating popular Governor Christie Todd Whitman, with Whitman's share of the black vote dropping eight points from her 1993 race. A study comparing the tight 1997 race to Whitman's 1993 victory over Democrat Jim Florio-- who had no black turnout program--is treated like a state secret within the party. "It's remarkable," says Corzine campaign manager Stephan DeMicco, who declined to share a copy of the study with me. "It's got too much strategic power for us.... The study of '93 to '97 has resulted in whole new approaches to electoral targeting for us. The lessons that we learned from that study ... are being applied in many other states now."

"We actually call it ... the New Jersey Plan," says Thomas,...

...

The sheet shows each precinct's registered voters, turnout history, Democratic performance, and, most important, vote goal. Precincts where turnout is low but Democratic performance is high are marked in red, since they constitute prime knock-and-drag territory on Election Day.

...

In addition to these mailings, Thomas hit black voters with live phone calls urging them to vote. On the Monday before the election, voters were given a reminder call; on Election Day itself, a massive phone bank operated from 8:00 a.m. to 7:00 p.m. "Those phones are on a continual cycle," Thomas says. "The only way [a voter] comes out of the cycle is if [he] answers the phone." When a district is performing below Thomas's expectations, she can immediately retarget the phones, increasing calls to that area.

...

Where exactly all these workers came from became a campaign issue in the final days of the race, when The New York Times discovered that many were being shipped in from homeless shelters and drug-rehab centers in Pennsylvania.

...

"Understand the mission," he instructs his flushers. "The mission is to get a registered voter out of their home and to the polls. Ladies and gentlemen, we are in very bad shape. I want you to load up on everything that moves." He then takes aside a sound-truck driver and traces a route for him to follow. Minutes later, the teams are blanketing the streets, knocking on doors and dragging out voters." ("NEWARK DISPATCH, Knock and Drag" Ryan Lizza, The New Republic , Post date 11.09.00, Issue date 11.20.00)

The effort is admirable, but consider the implication for spoiled ballots. I will dramatize a bit. An elderly lady, perhaps not even registered as a Democrat, lives in a mostly-black precinct and so is targeted. Her phone rings constantly, and whenever she gets up from watching her soap opera to answer it, she hears some stranger tell her to vote. Every two hours, a nice young college student or a scruffy man who looks like he needs a drink knocks on her door and asks her to step out for a few minutes and vote, please, because if his team can get enough turnout in the precinct they'll get a bonus.

So what does she do? She finally turns off her TV, walks quickly to the school, has her name checked off, goes into the voting booth, punches a few names at random, and hurries home to her soap opera. Or, she doesn't punch any names at all, having put a stop to the phone calls by having her name checked off.

The problem is akin to that of vote buying in a system using the secret ballot. How do you make sure the voter upholds his end of the deal? The vote buyer can't tell if the voter voted for the candidate who bought his vote. The knock-and-drag man can't tell if the voter voted for any candidate at all. What makes the problem even worse for the knock- and- drag man is that though the bought voter has some gratitude towards the candidate who paid him, the harassed voter is full of resentment. Thus, though knock-and-drag efforts will help a candidate on net, they will also result in more spoiled votes. And this will be particularly true when the effort targets a population based not on past voting habits or registration, which at least indicate an interest in politics, but on demographic characterics.

The well-established regularity that black voters tell pollsters they voted when they actually did not even more than non-black voters do is perhaps related (see Deufel and Kedar, 2000). Black voters may be more used to being pestered if they do not vote, and reflexively try to put off pollsters as well as campaigners. This also, of course, suggests that the polling result of overwhelming black support for Democrats may be exaggerated, though the Democratic campaigners would not pursue black turnout so heavily if the result were not mostly true.

The Harassed Voter Hypothesis may be testable. If there were an election in which blacks were expected to vote evenly for the two candidates, then the candidates would have no incentive to increase the black vote generally. Instead, they would target different groups for getting out the vote-- one candidate might go after old people and another after young people, for example. In that case, if the Harassed Voter Hypothesis is true, black districts should have no more spoilage than others. Blacks are heavily Democratic, so this suggests the place to look would be a Democratic primary election without racial undertones.

Another simple test is already failed by the Harassed Voter Hypothesis. It implies high rates of both undervoting and overvoting. Indeed, the quickest way to vote is not to cast votes for any candidate. Yet black percentage in a county is unrelated to undervoting. This lends support to the alternative that with high black turnout, more voters were inexperienced-- though though lack of significance in the Turnout variable in the county- level regressions runs against that hypothesis. Nelson v. Robinson, 301 So.2d 508, 512 (Fla. Ct. App. 2d Dist. 1974)). Ballots spoiled by voters who are unhappy to be at the polls and eager to leave as quickly as possible are not the same as ballots spoiled by voters who have taken great pains to vote and hope strongly that the candidate of their choice wins.

10. Voting Machine Choice

Which counties chose to use the kinds of voting machines that have low spoilage? The variable explained below is whether a county used one of the three ballot types discovered earlier to have high spoilage rates: Optical_Central_ESS, Punchcard_Votomatic, or Punchcard_Datavote. For this we must use a logit regression, since OLS is unsuitable when the variable to be explained takes just values of 0 and 1 rather than a continuous range (though here, OLS happens to yield essentially the same results.)

There are 67 observations, and the pseudo-R-squared is .25. ("Pseudo" because logit does not generate the standard R-squared of ordinary least squares.) The table below shows the coefficient (which is not straightforward to interpet in a logit regression), the z-statistic, and the simple correlation between that variable and whether a county used a low- spoilage ballot. Regression coefficients significant at the 10 percent level are starred.


 
HIGH SPOILAGE MACHINES:    Coefficient z-statistic  Correlation 
 
Constant                 |    .70       0.14            --
Log(population)          |   -.42       1.13          -.16    
Population growth        |   -.01       1.09          -.220
Foreign                  |   -.08       0.23           .08
Non-English-speaking     |    .35       1.57           .13
MeanIncome               |   -.14       0.68          -.16
Poverty                  |   -.02      -0.14           .24
Gore                     |   -.08       1.35           .02 
Black                    |    .15*      2.07*          .24 
Over65                   |    .18*      2.85*          .11

First, note the simple correlations. The positive correlations indicate that counties with more immigrants, non-English speakers, Gore voters (barely), blacks, and old people tend to choose high-spoilage machines. The negative correlations indicate that counties which are larger, growing, richer, and with fewer poor people tend not to choose high- spoilage machines. These simple correlations, however, confound the effects of the different variables that are correlated with each other. The regression coefficients adjust for the independent effect of each variable. The two significant variables are the percentages of people who are black and who are over age 65, both of which increase the probability the county chooses a high-spoilage ballot. Such things as county size, population growth, and the wealth of the county do not seem to matter.

The sample size is small for logit, however, and there is considerable collinearity, as can be seen from the correlation matrix in Table 4. An example of the instability of the results is that when I used the state website information on machine type, which differs only for Baker and Martin counties, the percentage of black population was marginally insignificant instead of significant, as it is above. Using the same data as above, I ran the regression again after dropping any variable with a z-statistic less than 1.30. The regression now has a pseudo-R-squared of .19 and the following coefficients: :


HIGH SPOILAGE MACHINES    Coefficient z-statistic
Constant                 | -1.69       1.10    
Noneng                   |   .18*      2.00*  
Gore                     |  -.10*      2.21* 
Black                    |   .19*      2.89*    
Over65                   |   .15*      2.76*

This second regression contains indications that counties with more non-English speakers and fewer Gore voters, other things equal, pick high-spoilage ballots.

A regression,however, is probably not the right tool to use when asking the question: "Who lost more votes due to high-spoilage machines-- Bush or Gore?" For that question,the best county-level approach would be to go back to my earlier regressions to find the effect of high-spoilage machines conditioning on other county features, and then multiply that by the number of Bush and Gore votes in each county.

10. Conclusions

Counties which use punchcard ballots or optical ballots counted centrally, smaller counties, and counties with more black people tend to have more spoiled ballots. The percentages of old people and immigrants do not seem to matter, nor does a county's wealth, population growth, or turnout relative to past elections. Most importantly for the current controversy, whether a county went for Gore or for Bush has little relation to the number of spoiled ballots.

11. References and Data Sources

Other Regression Studies
- A good place to look for a list of web statistical studies of the election is Jonathan O'Keefe's website.
- Adams and Fastnow, "A Note on the Voting Irregularities in Palm Beach, Florida." Madison.hss.cmu.edu. A good website, with links to lots of regression studies of the Buchanan vote in Florida.
- "CFP'93 - Assuring Accuracy, Integrity and Security in National Elections : The Role of the U.S. Congress," Roy G. Saltman, National Institute of Standards and Technology 2/12/93 http://www.cpsr.org/conferences/cfp93/saltman.html (January 5, 2001).
- Evoy, Susan, "CPSR Answers Computer-Based Voting Technology Questions," http://www.cpsr.org/issues/voting_answers.html (January 5, 2001).
- Deufel, Benjamin and Orit Kedar, ``But Do They Really Vote? Correcting for Overreporting of Turnout,'' November 2000, Dept. of Government, Harvard University.
- Hansen, Bruce, ``A NonParametric Analysis of UnderVotes in the Palm Beach Presidential Vote: Implications for a Recount,'' http://www.ssc.wisc.edu/~bhansen/vote/florida4.pdf Uses precinct- level spoilage rates for Palm Beach County to figure out the likelihood of Gore winning on a recount. Not too related to my paper.
- Hansen, Bruce, ``A Precinct-Level Demographic Analysis of Double- Punching in the Palm Beach Presidential Vote,'' http://www.ssc.wisc.edu/~bhansen/vote/florida2.pdf In Palm Beach County, the percentages of voters over 65, black, hispanic, and Democrat all increase the amount of overvoting (double-punching) in a precinct.
- Hansen, Bruce, "Precinct-Level Voting and Demographic Data," http://www.ssc.wisc.edu/~bhansen/vote/data.html (January 5, 2001).
- Netrinsics.com, "Comparison of Precinct Return Data between Duval County and Lee County, Florida," Www.netrinsics.com/DuvalVsLee/DuvalVsLee.html Uses precinct- level spoilage rates for Duval and Lee Counties and finds that in Duval County, spoilage is correlated with Gore vote strongly and Bush vote not at all. In Lee County, the candidate correlations are equal.
- Orszag, Peter and Jonathan Orszag,"A Simple Analysis of Discarded Votes by Precinct in Palm Beach," November 10, 2000. Www.sbgo.com/Papers/Election/ANALYSIS%20OF%20DISCARDED%20VOTES% 20BY% 20PRECINCT%20IN% 20PALM%20BEACH.pdf. Finds that spoilage rates in Palm Beach County were greater in precincts where Gore was strong, without controlling for any other variables.
  David J. Rusin, ``Likelihood of Altering the Outcome of the Florida 2000 Presidential Election by Recounting,'' Northern Illinois University, Jan. 5, 2001. http://www.math.niu.edu/~rusin/uses-math/recount/index.html
Data
- The Florida Division of Elections.
- Fisher.lib.virginia.edu/ccdb/county94.html. The University of Virginia's website that has the U.S. 1994 county data arranged nicely,
- Division of Elections - Florida Department of State, "Certified Voting Systems Used in Florida," election.dos.state.fl.us/votemeth/cvs.shtml. This has descriptions of which counties use which voting machines.
- Division of Elections - Florida Department of State, "Florida Administrative Code," Election.dos.state.fl.us/fac/index.shtml
- The Orlando Sentinel survey of all 67 county elections supervisors in Florida on November 14, 2000.
- An exit poll, available from CNN or ABC News for the 2000 Presidential election in Florida.
- CNN DATA on undervotes is available at Www.cnn.com/interactive/allpolitics/0012/fl.undervotes/framese t.exclude.html (December 9, fle date in dreictory); and older and different data which includes overvotes and overseas ballots is at Www.cnn.com/ELECTION/2000/resources/ballot1.htm (file date in directory, Dec. 1, 2000).
- CNN, The Associated Press, Dave Rusin and Peter Goodrich, [email protected] (January 1, 2001), county data on undervotes and overvotes.
- US Census Bureau, Florida map showing counties, Www.census.gov/datamap/www/12.html.
Journalism
- Barbanel, Josh and Ford Fessenden, "Racial Pattern in Demographics of Error-Prone Ballots" in the 29 November 2000 New York Times )
- Lizza, Ryan, "NEWARK DISPATCH, Knock and Drag" The New Republic , Post date 11.09.00, Issue date 11.20.00.
- ``179,295 ballots that didn't count,'' Orlando Sentinel , Orlandosentinel.com/elections/1118problems.htm (link dead by Dec. 9, 2000) < p >
Law
- The Florida election laws, Www.leg.state.fl.us/statutes/index.cfm?Mode= ViewStatutes&Submenu=1,
- Nelson v. Robinson, 301 So.2d 508 (Fla. Ct. App. 2d Dist. 1974.
Other References
- King, Gary. A Solution to the Ecological Inference Problem, Princeton University Press.
- Rasmusen, Eric, ``Observed Choice, Estimation, and Optimism About Policy Changes,'' Public Choice, (1998) 97: 65-91.
- Robinson, William, "Ecological Correlation and the Behavior of Individuals, " American Sociological Review, 15: 351-357.

I'd like to thank for their comments Howard Marvel, John McHale, Wallace Mullin, Alex Raskovich, Michael Ward, and an anonymous NBER data guru, none of whom necessarily approves of (or has even seen) the latest version of these notes.

RAW STATA OUTPUT FOR THE COUNTY REGRESSIONS WITH THE LITERACY VARIABLE


   *REGRESSION   1  ;
.    regress spoilper  Mmod1 Mmod2  Mmod3    Mmod6  Mmod7  Mmod8 Sublit
> LPop92
> popgro
> Foreign noneng MeanInc poverty Goreper  blacper oldper age75pl
> highturn      ;

  Source |       SS       df       MS                  Number of obs =
65
---------+------------------------------               F( 18,    46) =
14.80
   Model |  518.525195    18  28.8069553               Prob > F      =
0.0000
Residual |  89.5137475    46  1.94595103               R-squared     =
0.8528
---------+------------------------------               Adj R-squared =
0.7952
   Total |  608.038942    64  9.50060847               Root MSE      =
1.395

-----------------------------------------------------------------------
-------
spoilper |      Coef.   Std. Err.       t     P>|t|       [95% Conf.
Interval]
---------
+--------------------------------------------------------------------
   Mmod1 |  -.9701006   1.725915     -0.562   0.577      -4.444187
2.503985
   Mmod2 |   3.918342   .7024633      5.578   0.000       2.504356
5.332327
   Mmod3 |  -.5017297   .7107465     -0.706   0.484      -1.932388
.9289288
   Mmod6 |    2.01425   1.669907      1.206   0.234      -1.347099
5.375599
   Mmod7 |   4.009866   .8257787      4.856   0.000       2.347659
5.672072
   Mmod8 |    2.48855   .8077395      3.081   0.003       .8626553
4.114446
  Sublit |   .1897027   .0745313      2.545   0.014       .0396791
.3397263
  LPop92 |  -.1402608   .2872131     -0.488   0.628      -.7183907
.4378691
  popgro |  -.0015095   .0084955     -0.178   0.860      -.0186101
.0155911
 Foreign |   .0063984   .1367082      0.047   0.963      -.2687809
.2815776
  noneng |   .0073044   .1007893      0.072   0.943      -.1955738
.2101826
 MeanInc |   .1840353   .1144325      1.608   0.115      -.0463054
.414376
 poverty |    .042256   .0864096      0.489   0.627      -.1316775
.2161896
 Goreper |  -.0328812   .0322865     -1.018   0.314      -.0978705
.0321081
 blacper |    .067874   .0384695      1.764   0.084      -.0095611
.1453091
  oldper |   .0194023   .1257668      0.154   0.878      -.2337532
.2725579
 age75pl |  -.2153074   .2991274     -0.720   0.475      -.8174195
.3868048
highturn |   1.631412   2.173899      0.750   0.457       -2.74442
6.007243
   _cons |  -5.749018   4.953201     -1.161   0.252       -15.7193
4.221259




. correlate  spoilper Sublit Underper Overper    ;
(obs=52)

        | spoilper   Sublit Underper  Overper
--------+------------------------------------
spoilper|   1.0000
  Sublit|   0.5660   1.0000
Underper|   0.2017   0.0356   1.0000
 Overper|   0.9698   0.5693  -0.0422   1.0000




   correlate Overper Mmod1 Mmod2  Mmod3    Mmod6  Mmod7 Mmod8 Sublit
> Pop92
> blacper popgro Foreign noneng MeanInc poverty Goreper oldper age75pl
> highturn ;
(obs=52)

        |  Overper    Mmod1    Mmod2    Mmod3    Mmod6    Mmod7
Mmod8
--------
+---------------------------------------------------------------
 Overper|   1.0000
   Mmod1|        .        .
   Mmod2|   0.7486   0.0000   1.0000
   Mmod3|  -0.4620   0.0000  -0.3865   1.0000
   Mmod6|        .        .        .        .        .
   Mmod7|   0.1726   0.0000  -0.1214  -0.1273   0.0000   1.0000
   Mmod8|  -0.1480   0.0000  -0.3865  -0.4054   0.0000  -0.1273
1.0000
  Sublit|   0.5693   0.0000   0.5029  -0.1023   0.0000  -0.0220  -
0.2154
   Pop92|  -0.1473   0.0000  -0.2854  -0.1530   0.0000  -0.1133
0.5231
 blacper|   0.6211   0.0000   0.4158  -0.1840   0.0000  -0.1124  -
0.1350
  popgro|  -0.3741   0.0000  -0.2682   0.2643   0.0000  -0.1435
0.0669
 Foreign|  -0.0894   0.0000  -0.2073  -0.0672   0.0000  -0.0524
0.3782
  noneng|  -0.0285   0.0000  -0.1928  -0.1011   0.0000   0.0942
0.3367
 MeanInc|  -0.4225   0.0000  -0.4684   0.0607   0.0000  -0.0949
0.4503
 poverty|   0.6112   0.0000   0.5217  -0.1503   0.0000   0.0607  -
0.3330
 Goreper|  -0.1069   0.0000  -0.1256   0.1927   0.0000  -0.2354
0.2675
  oldper|  -0.2699   0.0000  -0.1095   0.0353   0.0000  -0.1609
0.3610
 age75pl|  -0.2048   0.0000  -0.0748  -0.0762   0.0000  -0.1653
0.4303
highturn|  -0.0383   0.0000  -0.1810  -0.1335   0.0000  -0.0393
0.2247

        |   Sublit    Pop92  blacper   popgro  Foreign   noneng
MeanInc
--------
+---------------------------------------------------------------
  Sublit|   1.0000
   Pop92|  -0.2985   1.0000
 blacper|   0.6113   0.0287   1.0000
  popgro|  -0.3210  -0.0988  -0.4611   1.0000
 Foreign|  -0.2043   0.7900  -0.0695   0.0954   1.0000
  noneng|  -0.1657   0.7075  -0.0685   0.0559   0.9593   1.0000
 MeanInc|  -0.6135   0.4130  -0.4359   0.3548   0.3583   0.2750
1.0000
 poverty|   0.6919  -0.2741   0.6647  -0.5189  -0.1790  -0.1048  -
0.7899
 Goreper|   0.0880   0.4860   0.1593   0.1757   0.3822   0.3173
0.2379
  oldper|   0.0949   0.0088  -0.4342   0.3883   0.0460  -0.0525
0.3078
 age75pl|   0.1355   0.1422  -0.3496   0.1728   0.0941  -0.0123
0.3127
highturn|  -0.2978   0.3135  -0.0152  -0.0541   0.3578   0.3507
0.0727

        |  poverty  Goreper   oldper  age75pl highturn
--------+---------------------------------------------
 poverty|   1.0000
 Goreper|  -0.1262   1.0000
  oldper|  -0.3879   0.3230   1.0000
 age75pl|  -0.3528   0.3558   0.9455   1.0000
highturn|   0.0725  -0.0510  -0.4375  -0.3820   1.0000


  *REGRESSION   1  BASIC. ;
.    regress Overper  Mmod1 Mmod2  Mmod3    Mmod6  Mmod7  Mmod8 Sublit
> LPop92 popgro Foreign noneng MeanInc poverty Goreper  blacper oldper
> age75pl highturn      ;

  Source |       SS       df       MS                  Number of obs =
52
---------+------------------------------               F( 16,    35) =
10.90
   Model |   377.77566    16  23.6109788               Prob > F      =
0.0000
Residual |  75.8034623    35  2.16581321               R-squared     =
0.8329
---------+------------------------------               Adj R-squared =
0.7565
   Total |  453.579123    51  8.89370829               Root MSE      =
1.4717

-----------------------------------------------------------------------
-------
 Overper |      Coef.   Std. Err.       t     P>|t|       [95% Conf.
Interval]
---------
+--------------------------------------------------------------------
   Mmod1 |  (dropped)
   Mmod2 |   5.107194   .8394107      6.084   0.000         3.4031
6.811289
   Mmod3 |   .6264834    .822626      0.762   0.451      -1.043536
2.296503
   Mmod6 |  (dropped)
   Mmod7 |   5.187284   1.418107      3.658   0.001       2.308372
8.066195
   Mmod8 |   2.282271   .9714315      2.349   0.025       .3101603
4.254382
  Sublit |   .1184544   .1016946      1.165   0.252      -.0879967
.3249054
  LPop92 |   .0034228   .3762638      0.009   0.993      -.7604335
.767279
  popgro |   .0104209   .0102347      1.018   0.316      -.0103566
.0311983
 Foreign |   .0320293   .1760174      0.182   0.857       -.325305
.3893636
  noneng |  -.0216732   .1344062     -0.161   0.873      -.2945322
.2511858
 MeanInc |   .1660901   .1269066      1.309   0.199       -.091544
.4237242
 poverty |   .0812081   .1204599      0.674   0.505      -.1633384
.3257547
 Goreper |  -.0326626   .0385273     -0.848   0.402      -.1108772
.045552
 blacper |   .0804912   .0614977      1.309   0.199      -.0443558
.2053381
  oldper |  -.1252164   .1539209     -0.814   0.421      -.4376925
.1872596
 age75pl |   .1072159   .3529485      0.304   0.763      -.6093076
.8237393
highturn |  -.0590026    2.98322     -0.020   0.984      -6.115261
5.997255
   _cons |  -4.132968   6.579606     -0.628   0.534      -17.49028
9.224342
-----------------------------------------------------------------------
-------

.     regress Underper  Mmod1 Mmod2  Mmod3    Mmod6  Mmod7  Mmod8
Sublit
> LPop92 popgro Foreign noneng MeanInc poverty Goreper  blacper oldper
> age75pl highturn      ;

  Source |       SS       df       MS                  Number of obs =
52
---------+------------------------------               F( 16,    35) =
4.78
   Model |  20.3511387    16  1.27194617               Prob > F      =
0.0001
Residual |   9.3128446    35  .266081274               R-squared     =
0.6861
---------+------------------------------               Adj R-squared =
0.5425
   Total |  29.6639833    51  .581646732               Root MSE      =
.51583

-----------------------------------------------------------------------
-------
Underper |      Coef.   Std. Err.       t     P>|t|       [95% Conf.
Interval]
---------
+--------------------------------------------------------------------
   Mmod1 |  (dropped)
   Mmod2 |   -1.02658   .2942193     -3.489   0.001      -1.623877    -
.429283
   Mmod3 |  -.8624329   .2883361     -2.991   0.005      -1.447786   -
.2770794
   Mmod6 |  (dropped)
   Mmod7 |  -.4469883   .4970566     -0.899   0.375      -1.456067
.5620902
   Mmod8 |   .6372364   .3404935      1.872   0.070      -.0540022
1.328475
  Sublit |   .0240708   .0356447      0.675   0.504      -.0482918
.0964333
  LPop92 |  -.2101169   .1318831     -1.593   0.120      -.4778538
.05762
  popgro |  -.0051483   .0035873     -1.435   0.160      -.0124309
.0021344
 Foreign |   .0276325   .0616953      0.448   0.657      -.0976157
.1528807
  noneng |  -.0178882   .0471103     -0.380   0.706      -.1135271
.0777508
 MeanInc |   .0251369   .0444816      0.565   0.576      -.0651656
.1154394
 poverty |   .0250832    .042222      0.594   0.556       -.060632
.1107985
 Goreper |   .0077062   .0135041      0.571   0.572      -.0197086
.0351209
 blacper |  -.0115184   .0215554     -0.534   0.596      -.0552781
.0322413
  oldper |   .0496365   .0539504      0.920   0.364      -.0598885
.1591616
 age75pl |  -.1203142   .1237109     -0.973   0.337      -.3714607
.1308322
highturn |    .431748   1.045639      0.413   0.682      -1.691013
2.554509
   _cons |   .3968967   2.306198      0.172   0.864      -4.284934
5.078727

Back to Rasmusen's Florida Election Page, Php.indiana.edu/~erasmuse/elections/rasmusen.htm.

URL: Php.indiana.edu/~erasmuse/elections/spoiled.htm. Indiana University, Department of Business Economics and Public Policy, Kelley School of Business , BU 456, 1309 East Tenth Street, Bloomington, Indiana 47405-1701, (812)855-9219. 2000-2001: Olin Senior Research Fellow, Harvard Law School, (617) 496-4878. Comments: [email protected].