Written by Claire Duncan
on October 21, 2022

Blog Post 7: Unexpected Shocks

I chose gas prices as a shock of this year that I could try to quantify in order to test the relationship with support for Democrats or Republicans. The data I looked at took the nationwide average price of gas in dollars per gallon for each week of the year.

gas <- read_csv("/Users/claireduncan/Downloads/07 - Shocks (10:20)/Gas_prices_weekly.csv")

gas_week <- gas %>%
  mutate(date = mdy(date)) %>%
  mutate(year = substr(date, 1, 4)) %>%
  filter(year == 2022) %>%
  mutate(week = strftime(date, format = "%V")) %>%
  mutate(month = strftime(date, format = "%m")) %>%
  mutate(avg_change = 100*
           ((avg_gasprice - lag(avg_gasprice)) / (lag(avg_gasprice))))

gas_week$avg_change[is.na(gas_week$avg_change)] <- 0

When considering how to quantify gas prices as a shock, I looked at both the average price of gas per week in the US and then the percent change in gas price from week to week, comparing one week to the previous week. This graph shows that percent change, demonstrating the significant increase around week 20. That was when the gas prices hit a record high in June - averaging about $5 nationwide; before the average fell for 98 straight days, dropping to around $3.5 per gallon, which is demonstrated by the negative percent changes during that time frame. The decrease after the large peak increase starts around week 24 which was mid June of this year.

gas_week %>%
  ggplot(aes(x=week, y=avg_change)) +
  geom_point(color="blue") + 
  theme_classic() + geom_hline(yintercept=0, color = "black", size=0.5) +
  labs(x= "Week of Year", y="Percentage Change in Gas Prices \n(from previous week)", title = "2022 Percent Change in Gas Prices (Week to Week)") + scale_y_continuous(limits=c(-5,5))

Then I created a graph using the NYT scrape of the term “gas prices” in which the most articles per week were written in early March, with other peaks in mid to late June. Noted on here are the greatest and lowest data points from the previous slide indicating increases and decreases in price change by week. It’s clear that when gas prices first increased significantly, there were the most articles written about that change.

Using the generic ballot data we have, I ran correlations between gas price and support for Democrats and Republicans. Only looking at the weekly price of gas, an increase in dollar per gallon was associated with a very small increase in support for both Democrats and Republicans although this was only statistically significant for Republican generic ballot support.

Then, I looked at the average change in price, comparing each week to the previous week. I thought this might be a better indicator of how gas prices have been a “shock” considering how people make quick retrospective comparisons in how much they have to pay week to week. This measure was greater and more significant for both democratic and republican support. For Democrats, an increase in gas price change was correlated with a decrease in support of almost 1%. Meanwhile for Republicans, an increase in gas price change was correlated with an increase in support of 0.8%. This seems to be indicative of not only how Republicans have used gas prices to attack Democrats and fire up their base, but of how voters use the logic of retrospection - holding the incumbent Democrats responsible for the sharp increases in gas prices this summer.

gas_Dem <- lm(avg_support ~ avg_gasprice, data = gas_comp_Dem)
stargazer(gas_Dem, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             avg_support        
## -----------------------------------------------
## avg_gasprice                   0.073           
##                               (0.099)          
##                                                
## Constant                     42.647***         
##                               (0.406)          
##                                                
## -----------------------------------------------
## Observations                    262            
## R2                             0.002           
## Adjusted R2                   -0.002           
## Residual Std. Error      0.806 (df = 260)      
## F Statistic             0.553 (df = 1; 260)    
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

gas_Dem2 <- lm(avg_support ~ avg_change, data = gas_comp_Dem)
stargazer(gas_Dem2, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             avg_support        
## -----------------------------------------------
## avg_change                   -0.095***         
##                               (0.014)          
##                                                
## Constant                     42.980***         
##                               (0.046)          
##                                                
## -----------------------------------------------
## Observations                    262            
## R2                             0.145           
## Adjusted R2                    0.141           
## Residual Std. Error      0.746 (df = 260)      
## F Statistic           43.975*** (df = 1; 260)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

gas_Rep <- lm(avg_support ~ avg_gasprice, data = gas_comp_Rep)
stargazer(gas_Rep, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             avg_support        
## -----------------------------------------------
## avg_gasprice                 0.872***          
##                               (0.075)          
##                                                
## Constant                     40.840***         
##                               (0.309)          
##                                                
## -----------------------------------------------
## Observations                    262            
## R2                             0.342           
## Adjusted R2                    0.339           
## Residual Std. Error      0.613 (df = 260)      
## F Statistic          135.075*** (df = 1; 260)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

gas_Rep2 <- lm(avg_support ~ avg_change, data = gas_comp_Rep)
stargazer(gas_Rep2, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             avg_support        
## -----------------------------------------------
## avg_change                   0.081***          
##                               (0.014)          
##                                                
## Constant                     44.375***         
##                               (0.044)          
##                                                
## -----------------------------------------------
## Observations                    262            
## R2                             0.119           
## Adjusted R2                    0.116           
## Residual Std. Error      0.709 (df = 260)      
## F Statistic           35.271*** (df = 1; 260)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The graph below shows the generic ballot averages over 2022, with the greatest increase and decrease in percent change noted.

polls_df %>%
  group_by(candidate == 'Democrats') %>%
  mutate(date_ = as.Date(date_)) %>%
  ggplot(aes(x = week, y = avg_support, colour = candidate)) +
  geom_line(aes(group=candidate), size = 0.3) + geom_point(size = 0.3) +
  #scale_x_date(date_labels = "%b, %Y") + 
  labs(y="Generic Ballot Support", x = "Week of 2022", title = "Generic Ballot Support Over 2022") +
  scale_color_manual(values=c('blue','red')) +
  theme_classic() +
  geom_segment(x=("10"), xend=("10"),y=0,yend=33, lty=2, color="purple", alpha=0.4) +
  annotate("text", x=("07"), y=("45.4"), label="Greatest gas \nprice % change", size=3) +
  geom_segment(x=("32"), xend=("32"),y=0,yend=33, lty=2, color="purple", alpha=0.4) +
  annotate("text", x=("30"), y=("45"), label="Greatest gas \nprice \n % decrease", size=3)

While gas prices as a shock could also be considered an economic factor, I wanted to include them in my model just to determine if they would make a difference at all. From the weekly gas price data, I used the maximum and minimum values of change in gas prices within a given year as my measurement, in order to try to capture what might be the most shocking value, or a value that could impact voter behavior. For this, maximum values are positive changes (increases) in gas prices, and minimum values are negative changes (decreases) in gas prices.

gas <- gas %>%
  mutate(date = mdy(date)) %>%
  mutate(year = substr(date, 1, 4)) %>%
  mutate(week = strftime(date, format = "%V")) %>%
  mutate(month = strftime(date, format = "%m")) %>%
  mutate(avg_change = 100*
           ((avg_gasprice - lag(avg_gasprice)) / (lag(avg_gasprice))))

gas$avg_change[is.na(gas$avg_change)] <- 0
gas$year <- as.character(gas$year)

aggregate(gas$avg_change, by = list(gas$year), max)

##    Group.1         x
## 1     1990  4.534005
## 2     1991  1.964453
## 3     1992  2.436647
## 4     1993  4.297994
## 5     1994  2.389381
## 6     1995  1.880036
## 7     1996  3.156146
## 8     1997  2.933780
## 9     1998  3.000000
## 10    1999  6.391347
## 11    2000  5.629838
## 12    2001  4.735547
## 13    2002  6.905594
## 14    2003  7.375538
## 15    2004  5.260304
## 16    2005 17.586207
## 17    2006  5.832629
## 18    2007  5.119597
## 19    2008  5.126096
## 20    2009  7.795958
## 21    2010  3.571429
## 22    2011  6.083412
## 23    2012  3.905359
## 24    2013  5.391719
## 25    2014  2.145663
## 26    2015  6.046312
## 27    2016  6.518197
## 28    2017 11.671530
## 29    2018  1.967335
## 30    2019  3.996865
## 31    2020  4.366347
## 32    2021  5.277889
## 33    2022 13.691796

gas2 <- aggregate(avg_change ~ year, data = gas, max)
data.frame(gas2)

##    year avg_change
## 1  1990   4.534005
## 2  1991   1.964453
## 3  1992   2.436647
## 4  1993   4.297994
## 5  1994   2.389381
## 6  1995   1.880036
## 7  1996   3.156146
## 8  1997   2.933780
## 9  1998   3.000000
## 10 1999   6.391347
## 11 2000   5.629838
## 12 2001   4.735547
## 13 2002   6.905594
## 14 2003   7.375538
## 15 2004   5.260304
## 16 2005  17.586207
## 17 2006   5.832629
## 18 2007   5.119597
## 19 2008   5.126096
## 20 2009   7.795958
## 21 2010   3.571429
## 22 2011   6.083412
## 23 2012   3.905359
## 24 2013   5.391719
## 25 2014   2.145663
## 26 2015   6.046312
## 27 2016   6.518197
## 28 2017  11.671530
## 29 2018   1.967335
## 30 2019   3.996865
## 31 2020   4.366347
## 32 2021   5.277889
## 33 2022  13.691796

colnames(gas2)[2] ="maxchange"


gas3 <- gas %>%
  group_by(year) %>%
  filter(avg_change == min(avg_change)) %>%
  select(year, avg_change)

colnames(gas3)[2] ="minchange"

gas3 <- gas3%>%
  drop_na()


gas2 <- gas2%>%
  drop_na()

gasall <- gas2 %>%
  left_join(gas3, by=c("year"))

gasall$year <- as.numeric(gasall$year)

combineddata <- read_csv("/Users/claireduncan/Downloads/combineddata.csv")

## Rows: 168 Columns: 16

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): st_cd_fips, state, rep_status, dem_status, winner_party
## dbl (11): district, cvap, year, x1, dem_votes_major_percent, rep_votes_major...

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dat <- combineddata %>%
  left_join(gasall, by = c("year"))

However, in the model I have been using, as it’s based solely on the 2014 and 2018 midterms, there were only 2 data points that could be used for both minimum and maximum values. This is not a good model due to the severe data limitations, and it does not give a predictive value that makes sense. For the Democratic vote percent, the minimum value was noted as NA because of singularities. The maximum change variable was statistically significant, as an increase in gas prices was correlated with a -13.177% decrease in vote percent. I tried to figure out why the minimum value would not show up - but it does when I take the maximum value out of the equation. So I ended up running two models, one using the minimum and one using the maximum. The one with the minimum gave a correlation of 0.961, indicating decreases in gas prices were associated with about a percent increase in Democratic vote share. I also ran two models for Republican vote share, given the same problem occurred. I think this is because of the data I am using. For the Republican vote percent, the positive change in gas prices was correlated with a 13.177 increase in vote, inverse of the Democratic vote relationship, which was statistically significant. The correlation between Republican vote share and minimum change was -0.961, indicating that smaller gas price changes had a negative effect on voting for Republicans, which was statistically significant.

D_lm_max <- lm(dem_votes_major_percent ~ maxchange + incumbency + avg_rating + turnout, data = dat)
stargazer(D_lm_max, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                       dem_votes_major_percent  
## -----------------------------------------------
## maxchange                   -13.177***         
##                               (3.243)          
##                                                
## incumbency                    -0.020           
##                               (0.438)          
##                                                
## avg_rating                   -2.685***         
##                               (0.107)          
##                                                
## turnout                        0.023           
##                               (0.027)          
##                                                
## Constant                     86.519***         
##                               (7.698)          
##                                                
## -----------------------------------------------
## Observations                    168            
## R2                             0.802           
## Adjusted R2                    0.797           
## Residual Std. Error      2.414 (df = 163)      
## F Statistic          164.974*** (df = 4; 163)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

prob_D_lm_max <- predict(D_lm_max, newdata = 
                           data.frame(avg_rating = 3.3, turnout = 48.61, incumbency = 1, maxchange = 13.691796), type="response", interval="predict")
prob_D_lm_max

##         fit      lwr       upr
## 1 -101.6753 -176.498 -26.85259

D_lm_min <- lm(dem_votes_major_percent ~ minchange + incumbency + avg_rating + turnout, data = dat)
stargazer(D_lm_min, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                       dem_votes_major_percent  
## -----------------------------------------------
## minchange                    0.961***          
##                               (0.236)          
##                                                
## incumbency                    -0.020           
##                               (0.438)          
##                                                
## avg_rating                   -2.685***         
##                               (0.107)          
##                                                
## turnout                        0.023           
##                               (0.027)          
##                                                
## Constant                     63.924***         
##                               (2.323)          
##                                                
## -----------------------------------------------
## Observations                    168            
## R2                             0.802           
## Adjusted R2                    0.797           
## Residual Std. Error      2.414 (df = 163)      
## F Statistic          164.974*** (df = 4; 163)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

prob_D_lm_min <- predict(D_lm_min, newdata = 
                           data.frame(avg_rating = 3.3, turnout = 48.61, incumbency = 1, minchange =-3.6736641), type="response", interval="predict",)
prob_D_lm_min

##        fit      lwr      upr
## 1 52.62306 47.82112 57.42499

R_lm_max <- lm(rep_votes_major_percent ~ maxchange + incumbency + avg_rating + turnout, data = dat)
stargazer(R_lm_max, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                       rep_votes_major_percent  
## -----------------------------------------------
## maxchange                    13.177***         
##                               (3.243)          
##                                                
## incumbency                     0.020           
##                               (0.438)          
##                                                
## avg_rating                   2.685***          
##                               (0.107)          
##                                                
## turnout                       -0.023           
##                               (0.027)          
##                                                
## Constant                      13.481*          
##                               (7.698)          
##                                                
## -----------------------------------------------
## Observations                    168            
## R2                             0.802           
## Adjusted R2                    0.797           
## Residual Std. Error      2.414 (df = 163)      
## F Statistic          164.974*** (df = 4; 163)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

prob_R_lm_max <- predict(R_lm_max, newdata = 
                           data.frame(avg_rating = 3.3, turnout = 48.61, incumbency = 0, maxchange = 13.691796), type="response", interval="predict",)
prob_R_lm_max

##        fit      lwr      upr
## 1 201.6551 126.8917 276.4184

R_lm_min <- lm(rep_votes_major_percent ~ minchange + incumbency + avg_rating + turnout, data = dat)
stargazer(R_lm_min, type='text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                       rep_votes_major_percent  
## -----------------------------------------------
## minchange                    -0.961***         
##                               (0.236)          
##                                                
## incumbency                     0.020           
##                               (0.438)          
##                                                
## avg_rating                   2.685***          
##                               (0.107)          
##                                                
## turnout                       -0.023           
##                               (0.027)          
##                                                
## Constant                     36.076***         
##                               (2.323)          
##                                                
## -----------------------------------------------
## Observations                    168            
## R2                             0.802           
## Adjusted R2                    0.797           
## Residual Std. Error      2.414 (df = 163)      
## F Statistic          164.974*** (df = 4; 163)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

prob_R_lm_min <- predict(R_lm_min, newdata = 
                           data.frame(avg_rating = 3.3, turnout = 48.61, incumbency = 0, minchange =-3.6736641), interval="predict", type="response")
prob_R_lm_min

##        fit      lwr      upr
## 1 47.35673 42.50972 52.20374

Shocks are hard to predict, which my prediction model clearly demonstrated. I used the high increase value of this year in gas prices - which in March was a 13.69% change. The smallest decrease value of this year was -3.67%. The prediction for Democrat vote share using the gas price decrease was 52.62%. The prediction using the gas price increase was -101.67% which obviously does not make sense. The prediction for Republican vote share using the gas price decrease was 47.35%. The prediction for Republican vote share using the gas price increase was 201.65%, which also doesn’t make sense. I believe that the problems with the maximum increase in gas prices are due to this year’s huge shock of gas prices, and the outlier of the 13% change.

I will not be using gas prices in my final model and will have to do more data analysis and cleaning work in order to obtain more values for my prediction model when I make my final analysis. Also for this model, I did not use a pooled model considering the focus of my project and data presentation was on gas prices and even the limitations of data within that.

← → Top