corneretageres.com

Understanding Election Outcomes: A Comprehensive Analysis

Written on

When the polls conclude on the upcoming midterm elections this Tuesday night, the nation will briefly focus on deciphering the results. While the winning and losing candidates may be apparent, the intricacies of how a candidate prevailed or why another faltered will remain unclear. Analysts and commentators will sift through exit polls, results from key counties, and more, providing insights into voter demographics and preferences. Most Americans, largely unaware of the methodologies behind polling and election analysis, will consume this information as factual. By the end of the week, the election narrative will be largely established, with the public feeling confident in their understanding of the events that transpired.

But is the situation truly that straightforward?

To illustrate, let's consider the voting behavior of white (non-Hispanic) voters without a college degree in the 2016 elections. This voter segment played a pivotal role in President Trump's election victory. Numerous analyses have sought to uncover why many transitioned from supporting President Obama to Trump and where they may head in future elections. You might assume that two years post-election, we would have straightforward answers regarding their size as a percentage of 2016 voters and Trump's winning margin within this group. Let's examine the data.

The estimates derived from various datasets are remarkably divergent. The exit poll—widely recognized as the primary source for news stories and the American public—indicates that white non-college voters constituted only about a third of the electorate. In contrast, the reputable CCES suggests this group represented around half of the electorate. The green bars depict the most frequently utilized public data sources, which I will revisit when discussing the Catalist dataset.

Regarding candidate selection, while the Current Population Survey does not provide this data, the other sources exhibit a similar trend: datasets indicating white non-college voters as a smaller segment of the electorate also report a higher margin of victory for Trump. According to one interpretation of the 2016 data, white non-college voters were merely a third of the electorate and favored Trump by nearly 40 points. Conversely, another view posits that they constituted half of the electorate and supported Trump by a margin of 25-30 points.

So, what’s the accurate portrayal?

We will delve deeper into this example later, with strong evidence suggesting that non-college white voters were closer to half of the electorate than a third. However, no definitive answers exist. All these inquiries hinge on statistical estimations due to the secret ballot and the methods used for gathering voter information. Each dataset is based on distinct assumptions and choices, possessing both strengths and weaknesses. Addressing these complexities is a lengthy and technical discussion, but for a brief overview, you might consider exploring some relevant articles.

Currently, many researchers, both in public and private sectors, are making strides in tackling these challenging issues. At Catalist, we compile and analyze voter files across election cycles. These files are databases of registered voters maintained by each Secretary of State nationwide. Our company, among others, collects, standardizes, and provides this data to political campaigns, civic organizations, and research institutions. Importantly, Catalist maintains a unique longitudinal dataset spanning multiple election cycles. While voter files have long been utilized by campaigns, their public application is increasing.

We have recently generated estimates for an exit-poll-style demographic analysis of past elections since 2008 and plan to release projections for the midterm elections shortly after the 2018 election. Data will be accessible for select states and congressional districts, potentially including a national estimate (see below for further details).

Like other datasets mentioned, the voter file has its imperfections. However, it serves as an excellent foundation for understanding past election outcomes. The primary advantage of the voter file lies in its detailed view of the electorate's composition:

  • The voter file precisely identifies who voted, eliminating reliance on self-reported survey data, which is often inaccurate. We do not depend on precinct sampling or the recruitment of survey respondents, whose representativeness is uncertain. We start with a complete list of everyone who voted, as officially recorded by each Secretary of State.
  • Certain demographics, such as age and gender, are nearly fully represented in voter registration forms. For these categories, we are confident that our estimates are among the most accurate available. However, for demographics like race and education, self-reported data may not be uniformly available across the country. The voter file does contain extensive information useful for estimating these demographics. Our methodology employs large-scale machine learning models to assess the likelihood of each voter's racial background, based on factors such as ethnic names, neighborhood characteristics, and more. With precise geolocation data for most voters in our database, we can compare and adjust these estimates against census data and other external sources to correct inaccuracies inherent in standard modeling techniques. Importantly, our models are structured to ensure that even if we can't pinpoint individual demographic characteristics, our estimates maintain accuracy at the aggregate level. While we cannot guarantee 100% accuracy, we believe our efforts yield high-quality, defensible estimates compared to other available data.
  • Lastly, we have consistently gathered voter files in a standardized format since at least 2008 across all states, providing us with a significant historical data foundation for model development and method calibration.

This robust foundation aids our understanding of the electorate's composition. To analyze candidate choice, we merge voter file data with survey data using a statistical approach known as Multilevel Regression and Poststratification (MRP). MRP integrates flexible statistical models with extensive population datasets to deliver more reliable estimates for smaller subgroups where traditional survey methods may lack adequate sample sizes. Although MRP is a general technique that has demonstrated promising results and is gaining popularity (despite some skepticism), we believe it is particularly well-suited for use with voter files due to the rich data available. We initially developed specific aspects of this approach in an academic context in 2013 and have since refined these methods for this project.

How does our data compare to more familiar public sources, especially the prevalent exit poll? National-level data is available here, emphasizing demographics consistently reported across all datasets. Some notable trends include:

  • Our data indicates a less college-educated electorate (34% in 2016 compared to 50% in the exit poll), an older demographic (25% aged 65+, compared to 16%), a whiter electorate (74% versus 71%), and a larger proportion of women (53-55% across all years, sometimes lower for exit polls). These discrepancies date back to 2006 and encompass both modeled and directly reported data from Secretaries of State (age and gender).
  • Concurrently, our vote margins often lean more favorably towards Democrats than those reported in exit polls for each demographic group. The most significant and consistent differences are seen among white non-college voters (averaging an 8-point margin) and white women (6 points).
  • Aligning our data with other sources can highlight plausible estimates. For instance, the exit poll claims that half of the 138 million voters in 2016 held a college degree, suggesting 69 million college-educated voters. However, the Census American Community Survey for 2016 indicates only 66 million college-educated citizens in the country, implying an implausible turnout rate of 105%. Similar analyses reveal unrealistic turnout rates by age:

It's crucial to recognize that we are not merely conducting another survey. Our approach involves projecting the electorate down to every precinct in the country and aggregating that data to produce national figures. This makes our data exceptionally valuable for examining finer details within small geographic areas. Precinct-level election results yield significant insights into how various regions evolve and respond to different candidates. We can enhance this understanding by analyzing distinct demographic groups within those regions.

For instance, while extensive discussions have focused on white non-college voters transitioning from Obama to Trump, less attention has been paid to white college voters who shifted from Romney (with a +13 margin in 2012) to Clinton (who lost them by just 4 points). You can access a high-resolution version of this map, which illustrates that many of these voters originated from just outside the densely populated urban areas that predominantly favored Obama.

One challenge for the upcoming midterm elections is determining our immediate actions post-election. Given that processing voting records by the Secretary of State takes months, how do we estimate the electorate without that data?

We derive this information from three sources: (1) Catalist turnout modeling: Our extensive historical data allows us to create accurate pre-election voter turnout estimates based on past behaviors and other voting-related characteristics; (2) Early voting data: In many regions, a significant portion of voting occurs before election day, and we gather this data ahead of time; (3) Precinct-level election returns: While pre-election estimates are not flawless, we utilize precinct data to refine our electorate estimations post-election.

These charts demonstrate how this process operates, based on a pilot study conducted following Virginia’s 2017 gubernatorial election. Each dot represents a precinct, illustrating every precinct across the state. On the left, our pre-election turnout estimates (the model) are compared to actual turnout on a precinct-by-precinct basis. The pre-election model performed well, though it was not perfect. On the right, we illustrate how the model's accuracy varied by precinct, in relation to the number of registered white college individuals in each precinct. Before the election, our model slightly underestimated turnout in precincts with a higher concentration of white college-educated voters. After the election, once precinct data became available, we adjusted our model to account for these and other trends we initially overlooked. Months after completing this pilot study, we received the voting history data from Virginia’s Secretary of State, confirming that our post-election model was highly accurate:

What are our plans for 2018?

Once the election concludes, we will begin publishing our estimates of demographic vote shares and candidate choices. Our focus will start with states and congressional districts for which precinct results are available, filling in gaps using county and district data. While we know other entities will also generate post-election estimates this year, we believe our data will contribute significantly to understanding what transpired. Stay tuned for updates!