3  Results

Code
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(scales))
suppressWarnings(suppressPackageStartupMessages(library(sf)))
suppressWarnings(suppressPackageStartupMessages(library(maps)))
suppressPackageStartupMessages(library(viridis))
suppressPackageStartupMessages(library(readr))
suppressPackageStartupMessages(library(vcd))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(forcats))

3.1 Setting the Scene: Renewable Integration in the U.S. Over Time

Understanding the trajectory of U.S. electricity generation over the past three decades provides context for the challenges and opportunities presented by renewable energy integration. The plot below serves as the foundation of our analysis, showing the shift in energy sources from 1990 to 2023. The total area adds up to the country’s total generation capacity in any given year. It highlights the following:

  • The decline of coal and petroleum-based generation.
  • The rise of natural gas.
  • The rapid growth of intermittent renewables (wind and solar), especially in the past decade.
  • The consistency of other renewables, which include hydroelectric, geothermal, and biomass.

We focus this analysis on wind and solar electricity generation because they provide clean but intermittent energy (after all, we can’t control when the wind is blowing and the sun is shining!). As the U.S. works to reduce carbon emissions and combat climate change, the grid will be more reliant on these intermittent energy sources, so understanding their impact on grid reliability is crucial.

Code
us_gen_data <- read.csv("US_generation.csv")

colnames(us_gen_data)[1] <- "Year"

us_gen_data <- us_gen_data |>
  filter(Year >= 1990)

data1_long <- us_gen_data |>
  select(Year, wind, solar, `coal`, `natural.gas`, `nuclear`, `other.renewables`, `petroleum.and.other`) |>
  pivot_longer(cols = -Year, names_to = "Energy_Source", values_to = "Generation")

# Set factor levels (wind and solar at the top)
data1_long$Energy_Source <- factor(data1_long$Energy_Source, 
                                  levels = c("solar","wind","other.renewables", "nuclear", "natural.gas",
                                             "petroleum.and.other", "coal"))
# Area Chart
ggplot(data1_long, aes(x = Year, y = Generation, fill = Energy_Source)) +
  geom_area(alpha = 0.8, position = "stack") +
  scale_fill_manual(values = c("coal" = "gray10", 
                               "natural.gas" = "gray50", 
                               "nuclear" = "gray80", 
                               "other.renewables" = "cornflowerblue", 
                               "petroleum.and.other" = "gray30",
                               "wind" = "green4",
                               "solar" = "yellow")) +
  labs(title = "US Electricity Generation Capacity (MW) by Source 1990-2023",
       x = "Year",
       y = "Electricity Generation Capacity (MW)",
       fill = "Energy Source") +
  theme_minimal()

The plot above may give the impression that wind and solar currently contribute only a modest portion to the U.S. energy mix. However, when we plot their share of the country’s total electricity generation from 1990 to 2023, we see a significant growth trajectory. Starting from near zero contribution in the 1990s, wind and solar now account for 14.1% of total generation as of 2023. This trend is expected to accelerate significantly in the coming decades. According to EIA forecasts, wind and solar are projected to reach nearly 50% of total electricity generation by 2050.

This “Wind & Solar as a Fraction of Total Electricity Generation” metric measures the share of electricity generation coming from intermittent sources, and is often referred to as a percentage value called “Renewable Penetration”.

Code
# Add a new column to calculate the fraction of wind + solar generation
data2 <- us_gen_data |>
  mutate(
    Total_Generation = wind + solar + coal + natural.gas + nuclear + other.renewables + petroleum.and.other,
    Wind_Solar_Fraction = (wind + solar) / Total_Generation
  )

# Find the last data point
last_point <- data2 |> filter(Year == max(Year))

# Load and prepare the forecast data in R
forecast <- read.csv("Renewable_Energy_Forecast.csv") |>
  select(Year, Solar_Wind_Fraction) |>
  rename(Solar_Wind_Fraction = Solar_Wind_Fraction)

# Combine the historical and forecast data for plotting
combined_data <- bind_rows(
  data2 |> select(Year, Wind_Solar_Fraction) |> rename(Fraction = Wind_Solar_Fraction),
  forecast |> rename(Fraction = Solar_Wind_Fraction)
)

# Find the last data point
last_point2 <- forecast |> filter(Year == max(Year))

# Plot with the forecast
ggplot(data2, aes(x = Year, y = Wind_Solar_Fraction)) +
  geom_line(color = "darkolivegreen4", linewidth = 1.2) +
  geom_line(data = forecast, aes(x = Year, y = Solar_Wind_Fraction),
            color = "darkolivegreen4", linetype = "dashed", linewidth = 1.2) +
  geom_point(data = last_point, aes(x = Year, y = Wind_Solar_Fraction), 
             size = 2.5, color = "darkolivegreen4") +
  geom_text(data = last_point, aes(label = scales::percent(Wind_Solar_Fraction, accuracy = 0.1)),
            vjust = -0.5, hjust = 1, color = "darkolivegreen4") +
  geom_text(data = forecast |> filter(Year == 2050),
            aes(x = Year, y = Solar_Wind_Fraction,
                label = scales::percent(Solar_Wind_Fraction, accuracy = 0.1)),
            vjust = -0.5, hjust = -0.1, color = "darkolivegreen4") + 
 geom_point(data = last_point2, aes(x = Year, y = Solar_Wind_Fraction), 
             size = 2.5, color = "darkolivegreen4") +
  labs(title = "Wind & Solar as a Fraction of Total Electricity Generation (Historical 1990 - 2023 and Forecast 2024-2050)",
       x = "Year",
       y = "Renewable Penetration") +
  theme_minimal() +
  scale_y_continuous(labels = scales::percent_format())+
  scale_x_continuous(breaks = seq(1990, 2050, by = 5)) 

3.2 Grid Reliability

Grid reliability ensures that electricity is consistently available to meet the needs of homes, businesses, and critical services. As the U.S. transitions to a cleaner energy grid, understanding renewable energy’s impact on reliability is very important. The two key metrics the EIA uses to measure grid reliability are CAIDI and SAIDI, which help quantify the frequency and duration of power outages experienced by consumers:

  • CAIDI: Customer Average Interruption Duration Index. It is average number of minutes it takes to restore non-momentary electric interruptions.
  • SAIDI: System Average Interruption Duration Index. It is the minutes of non-momentary electric interruptions, per year, the average customer experienced.

Higher CAIDI and SAIDI values indicate a less reliable grid where interruptions/outages are either more frequent, take longer to restore, or both.

In the plot below, the comparison of reliability metrics on the left (including major events) and the right (excluding major events) demonstrates a relationship between grid reliability and “major event days”, described by the EIA as “any day that exceeds a daily SAIDI threshold”. When all major events are included, the plot shows significant spikes in some years: 2017, 2020, and 2021. Conversely, when major events are excluded, the metrics remain relatively stable over time, suggesting that the underlying reliability of the grid is not significantly affected by increasing renewable energy penetration, and is rather driven by the occurrence of these major events that we investigate in section 3.5.

Code
t1_reliability <- read.csv("T1_Reliability.csv")

t1_reliability <- t1_reliability |>
   filter(`Event_Category` %in% c("IEEE_All_Events", "IEEE_No_Major_Events"))

t1_reliability$Year <- as.integer(t1_reliability$Year)

# Reshape the data to a long format for plotting
data3_long <- t1_reliability  |>
  pivot_longer(cols = c(SAIDI, CAIDI), names_to = "Metric", values_to = "Value")

# Plot the line graph faceted by Event Category
ggplot(data3_long, aes(x = Year, y = Value, color = Metric)) +
  geom_line(linewidth = 1.2) +
  scale_x_continuous(breaks = seq(2013, 2023, 1)) +
  scale_y_continuous(limits = c(0, 550), expand = c(0, 0)) + 
  scale_color_manual(values = c("SAIDI" = '#3c5c8a',"CAIDI" = 'gray70'),
                     breaks = c("SAIDI","CAIDI")) +
  labs(title = "Reliability Metrics (2013–2023)",
       x = "Year",
       y = "Value",
       color = "Metric") +
  facet_wrap(~ `Event_Category`,
             labeller = labeller(`Event_Category` = c("IEEE_All_Events" = "All Events (With Major Event Days)",
                                                      "IEEE_No_Major_Events" = "Without Major Event Days"))) +
  theme_minimal() +
  theme(legend.position = 'bottom')

3.3 Is there any correlation/association between renewable penetration and grid reliability?

We explore the affect of renewable penetration on reliability metrics by plotting the former against the SAIDI reliability index by state. Since the wind doesn’t always blow and the sun doesn’t always shine, one might worry that the growth of renewable penetration, especially over the last decade, has caused higher levels of grid unreliability than other energy sources which operate independently of the weather.

However, no clear positive correlation appears in the scatter plot below when outage event type is not considered. A split in distribution becomes visible when points are grouped by whether SAIDI is calculated including or excluding major event/outage days. While both groups demonstrate a slightly negative correlation, the group excluding major event days shows the stronger relationship: an increase in renewable penetration correlates with a decrease in average outage duration (SAIDI). This suggests that the country’s increased dependency on wind and solar might indeed not relate to higher unreliability of the grid.

Code
# Load reliability metrics (SAIDI) from 2013-2023 by state
t4_rbs <- suppressMessages(read_csv("T4_ReliabilityByState.csv"))

# Pivot longer by state and filter to IEEE All Events
data_rbs <- t4_rbs |>
  pivot_longer(cols = c('CT':'HI'),names_to = 'state', values_to = 'SAIDI') |>
  filter(Method == 'IEEE',
         Event_Grouping == 'All Events (With Major Event Days)')

# Create 2023-only version with major event days
rbs_2023 <- data_rbs |>
  filter(Census_Division == '2023') |>
  select(state, SAIDI)

# Create 2023-only version without major event days
rbs_2023_no_major <- t4_rbs |>
  pivot_longer(cols = c('CT':'HI'),names_to = 'state', values_to = 'SAIDI') |>
  filter(Method == 'IEEE',
         Event_Grouping == 'Without Major Event Days',
         Census_Division == '2023') |>
  select(state, SAIDI)

# Load renewable penetration from 2023 by state
state_generation_2023 <- suppressMessages(read_csv('state_generation_2023.csv'))

# Calculate generation by energy type by state
suppressMessages(state_generation_2023 <- state_generation_2023 |>
  group_by(state, source) |>
  summarise(generation = sum(generation)))

# Derive renewable penetration from generation
state_pen_2023 <- state_generation_2023 |>
  pivot_wider(names_from = source, values_from = generation, values_fill = 0) |>
  mutate(total_gen = coal + natural.gas + other.renewables + petroleum.and.other + solar + wind + nuclear,
         wind_solar = wind + solar,
         w_s_pen = wind_solar/total_gen)

# Create combo dataset that disregards event grouping type
rbs_2023_combo <- t4_rbs |>
  pivot_longer(cols = c('CT':'HI'),names_to = 'state', values_to = 'SAIDI') |>
  filter(Method == 'IEEE',
         Event_Grouping %in% c('Without Major Event Days','All Events (With Major Event Days)'),
         Census_Division == '2023') |>
  select(state, SAIDI, Event_Grouping)

# Join reliability (SAIDI) with renewable penetration by state with major event days
scatter_state_2023 <- state_pen_2023 |>
  left_join(rbs_2023, by = 'state') |>
  select(state, w_s_pen, SAIDI)

# Join reliability (SAIDI) with renewable penetration by state without major event days
scatter_state_2023_no_major <- state_pen_2023 |>
  left_join(rbs_2023_no_major, by = 'state') |>
  select(state, w_s_pen, SAIDI)

# Join reliability (SAIDI) with renewable penetration by state for combo version 
scatter_state_2023_combo <- state_pen_2023 |>
  left_join(rbs_2023_combo, by = 'state') |>
  select(state, w_s_pen, SAIDI, Event_Grouping)

# Store scatter plot colored by event grouping type
plot2 <- ggplot(data = scatter_state_2023_combo, mapping = aes(x = w_s_pen, y = SAIDI, color = Event_Grouping)) +
  geom_point(alpha = .7, size = 3, na.rm = TRUE) +
  # coord_cartesian(ylim = c(0,1850)) + 
  scale_x_continuous(labels = scales::percent_format()) +
  labs(x = "Renewable Penetration",
       y = "Average Outage Duration (SAIDI)",
       color = "Outage Event Grouping") +
  theme_minimal() +
  theme(legend.position = 'bottom')

# Get rid of Hawaii (NA) to calculate correlation coefficient
no_hawaii <-scatter_state_2023_no_major |>
  filter(state != 'HI')

# Calc corr coeff
cc <- cor(no_hawaii$w_s_pen, no_hawaii$SAIDI)

# Store second scatter plot, a version zoomed in on without major event days that includes a regression
plot3 <- ggplot(data = filter(scatter_state_2023_combo, Event_Grouping == 'Without Major Event Days'), mapping = aes(x = w_s_pen, y = SAIDI)) +
  geom_point(alpha = .7, size = 3, color = '#31bdc3', na.rm = TRUE) +
  geom_smooth(data = filter(scatter_state_2023_combo, Event_Grouping == 'Without Major Event Days'),
              formula = y ~ x,
              method = 'lm',
              color = '#1d7276',
              na.rm = TRUE) +
  annotate("text", x = .35, y = 125, label = paste("Correlation:", round(cc, 3))) +
  coord_cartesian(ylim = c(0, 450)) +
  scale_x_continuous(labels = scales::percent_format()) +
  labs(x = "Renewable Penetration",
       y = "Average Outage Duration (SAIDI)") +
  theme_minimal()

# Patch plots together
suppressWarnings(plot2 + plot3 +
  plot_layout(widths = c(2, 2.3)) +
  plot_annotation(title = 'Average Outage Duration (SAIDI) by Renewable Penetration With and Without Major Events in 2023',
                  theme = theme(plot.title = element_text(size = 15))))

Visualizing this conclusion with mosaic plots further highlights the variables’ negative correlation, even when major event days were included. (Since both renewable penetration and reliability metrics are continuous variables, we first set a threshold at their respective medians and within each event grouping to then sort states into categories.) Notice that the plots similarly exhibit an inverse relationship between renewable penetration and outage duration regardless of major event days.

Code
# Store median of renewable penetration and of SAIDI with major events
all_events_pen_med <- median(scatter_state_2023$w_s_pen)
all_events_SAIDI_med <- median(scatter_state_2023$SAIDI)

# Prep data with major events for mosaic plot
all_events_mos <- scatter_state_2023 |>
  mutate(Penetration = factor(if_else(w_s_pen >= all_events_pen_med, 'Above', 'States below median'),
                              levels = c('States below median','Above')),
         SAIDI = factor(if_else(SAIDI >= all_events_SAIDI_med, 'Above', 'States below median'),
                        levels = c('States below median','Above'))) |>
  select(-w_s_pen)

# Store median of renewable penetration and of SAIDI without major events
no_major_pen_med <- median(no_hawaii$w_s_pen)
no_major_SAIDI_med <- median(no_hawaii$SAIDI)

# Prep data without major events for mosaic plot
no_major_mos <- no_hawaii |>
  mutate(Penetration = factor(if_else(w_s_pen >= no_major_pen_med, 'Above', 'States below median'),
                              levels = c('States below median','Above')),
         SAIDI = factor(if_else(SAIDI >= no_major_SAIDI_med, 'Above', 'States below median'),
                        levels = c('States below median','Above'))) |>
  select(-w_s_pen)

# Create mosaic plots for both event grouping types
mosaic(SAIDI ~ Penetration, all_events_mos, direction = c("v", "h"),
       main = "All Events (With Major Event Days)",
       highlighting_fill = c('#f37a717F','#f37a71'))

Code
mosaic(SAIDI ~ Penetration, no_major_mos, direction = c("v", "h"),
       main = 'No Major Event Days',
       highlighting_fill = c('#31bdc37F','#31bdc3'))

The information provided by both plot types not only contradicts worry that renewable energy, dependent on intermittent sources, takes a toll on reliable energy generation within the US. It also suggests the possibility that increased renewable energy positively affects the grid. Moreover, the difference in correlation when major event days are included or excluded leads us to wonder if another factor is at play.

3.5 Confirming that these standouts are caused by extreme weather events

Each high-disruption state and year can be linked to an extreme weather event, such as a winter storm, hurricane or tornado. By year, the events are as follows:

The news articles and websites linked above tie these weather events to power outages, and the plot below confirms that daily outage minutes spike in times of extreme weather. The plot shows a daily outage metric in Texas over the course of 2021, with the dates of Winter Storm Uri (February 10-25) highlighted in yellow. As we expected, the highest number of outage minutes, by a large measure, lands within these storm days.

Code
# Read the data
saidi_data <- read.csv("Texas_Daily_SAIDI.csv")

# Convert Date column to Date type
saidi_data$Date <- as.Date(saidi_data$Date, format = "%m/%d/%y")

highlight_period <- data.frame(
  xmin = as.Date("2021-02-10"),
  xmax = as.Date("2021-02-25"),
  ymin = -Inf,
  ymax = Inf,
  label = "Winter Storm Uri"
)

# Plot the data
ggplot(saidi_data, aes(x = Date, y = Daily.SAIDI)) +
  geom_rect(data = highlight_period, 
            aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = label),
            alpha = 0.8, inherit.aes = FALSE) +
  scale_fill_manual(name = "", values = c("Winter Storm Uri" = "#f3ff52")) +
  geom_line(color = "#3c5c8a", linewidth = .75) +
  labs(title = "Daily Outage Duration (SAIDI) in Texas in 2021",
       x = NULL,
       y = "Daily Outage Duration (SAIDI)") +
  theme_minimal() +
  theme(legend.position = "bottom")

3.6 Does outage duration differ among types of severe weather?

In fact, by looking at Texas across multiple years alongside all its severe weather events within the same time frame, we can analyze if there are any differences among the effects on SAIDI by types of severe weather. (Texas experienced nearly every weather disaster type listed by the NOAA, so it suffices to view Texas alone.) The plot below shows other daily SAIDI peaks, many of which are covered or initiated by weather events in yellow. Notice that the location patterns of the weather events relative to the peaks are not the same, nor are the size of the peaks they cover.

Code
# Load Texas's daily reliability data
texas <- suppressMessages(read_csv("Texas_Daily_SAIDI_2015_2022.csv"))

# Adjust column name
texas <- texas |>
  mutate(date = Date) |>
  select(-Date)

# Load Texas's weather disaster data from the same time period
weather <- suppressMessages(read_csv('events-TX-2015-2022.csv'))

# Join datasets
weather_SAIDI <- weather |>
  left_join(texas, by = 'date')

# Plot daily SAIDI with all weather disasters marked
ggplot(weather_SAIDI, aes(x = date, y = Daily.SAIDI)) +
  geom_line(data = texas,
              aes(x = date, y = Daily.SAIDI),
            linewidth = .75,
            color = '#3c5c8a') +
  geom_point(size = 4,
             shape = 21,
             fill = '#f3ff52',
             color = '#3c5c8a',
             alpha = .7,
             na.rm = TRUE) +
  coord_cartesian(xlim = c(as.Date('2015-01-01'),as.Date('2022-12-31'))) +
  labs(title = 'Daily Outage Duration (SAIDI) in Texas with Weather Disasters from 2015-2022',
       y = 'Daily Outage Duration (SAIDI)') +
  theme_minimal() +
  theme(axis.title.x = element_blank())

By calculating the mean daily SAIDI by weather disaster type, we learn that, at least on Texas’s power grid, different disasters have historically affected energy generation by differing levels of severity. Shown below, winter storms were by far the most interruptive for Texas over this time period, likely due to extreme Winter Storm Uri. The second highest interrupter was tropical cyclones, and the other three disaster types of severe storms, flooding, and wildfires trailed behind.

Code
# Calculate mean daily SAIDI by disaster
means <- weather_SAIDI |>
  select(disaster, Daily.SAIDI) |>
  filter(!is.na(Daily.SAIDI)) |>
  group_by(disaster) |>
  summarise(mean_SAIDI = mean(Daily.SAIDI))

# Order weather disaster types by mean SAIDI
means$disaster <- fct_rev(reorder(means$disaster, means$mean_SAIDI))

# Plot means
ggplot(means, mapping = aes(x = disaster, y = mean_SAIDI, fill = disaster)) +
  geom_col() +
  scale_fill_manual(values = c("Winter Storm" = "#de7ac0", "Tropical Cyclone" = "#9268ba", "Severe Storm" = "#d03330", "Flooding" = "#fa832c", "Wildfire" = "#8a574d")) +
  labs(title = 'Mean Daily Outage Duration (SAIDI) by Weather Disaster Type\nin Texas from 2015-2022',
       x = 'Weather Disaster Type',
       y = 'Mean Daily Outage Duration (SAIDI)') +
  theme_minimal() +
  theme(legend.position = 'none')