``` ```
Data wrangling and exploration to plot electricity production according to energy source and continent using the #TidyTuesday data set for week 29 of 2022 (19/7/2022): “Technology Adoption”
In this post, the Technology Adoption data set is used to illustrate data exploration R and adding information using the {countrycode} package. During data exploration, the tt$technology
data set is filtered to select for the “Energy” category, and the distinct values for “variable” and “label” are printed. A subset is then created to test adding full country names and corresponding continents based on 3 letter ISO codes in the data set using the countrycode()
function. The full data set is then wrangled into two tibbles for fossil fuel and low-carbon electricity production: the distribution for each energy source is plotted according to the corresponding continent. The full source for this blog post is available on GitHub.
Loading the R libraries and data set.
# Loading libraries
# Loading data
tt <- tt_load("2022-07-19")
Downloading file 1 of 1: `technology.csv`
# Printing a summary of tt$technology
# A tibble: 491,636 Ă— 7
variable label iso3c year group categ…¹ value
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 BCG % children who received a… AFG 1982 Cons… Vaccin… 10
2 BCG % children who received a… AFG 1983 Cons… Vaccin… 10
3 BCG % children who received a… AFG 1984 Cons… Vaccin… 11
4 BCG % children who received a… AFG 1985 Cons… Vaccin… 17
5 BCG % children who received a… AFG 1986 Cons… Vaccin… 18
6 BCG % children who received a… AFG 1987 Cons… Vaccin… 27
7 BCG % children who received a… AFG 1988 Cons… Vaccin… 40
8 BCG % children who received a… AFG 1989 Cons… Vaccin… 38
9 BCG % children who received a… AFG 1990 Cons… Vaccin… 30
10 BCG % children who received a… AFG 1991 Cons… Vaccin… 21
# … with 491,626 more rows, and abbreviated variable name ¹​category
# â„ą Use `print(n = ...)` to see more rows
# Printing the distinct "variable" and "label" pairs for the "Energy" category
## This will be used as a reference to create the "energy_type" column/variable
tt$technology %>% filter(category == "Energy") %>% select(variable, label) %>%
# A tibble: 11 Ă— 2
variable label
<chr> <chr>
1 elec_coal Electricity from coal (TWH)
2 elec_cons Electric power consumption (KWH)
3 elec_gas Electricity from gas (TWH)
4 elec_hydro Electricity from hydro (TWH)
5 elec_nuc Electricity from nuclear (TWH)
6 elec_oil Electricity from oil (TWH)
7 elec_renew_other Electricity from other renewables (TWH)
8 elec_solar Electricity from solar (TWH)
9 elec_wind Electricity from wind (TWH)
10 elecprod Gross output of electric energy (TWH)
11 electric_gen_capacity Electricity Generating Capacity, 1000 kilowa…
# Setting a seed to make results reproducible
# Using sample() to select six rows of tt$technology at random
sample_rows <- sample(x = rownames(tt$technology), size = 6)
# Creating a subset using the random rows
technology_sample <- tt$technology[sample_rows, ]
# Printing a summary of the randomly sampled subset
# A tibble: 6 Ă— 7
variable label iso3c year group categ…¹ value
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 Pol3 % children who rec… PRY 1993 Cons… Vaccin… 6.6 e1
2 pct_ag_ara_land % Arable land shar… LBR 1991 Non-… Agricu… 3.08e1
3 fert_total Aggregate kg of fe… CHE 1988 Prod… Agricu… 1.78e8
4 railp Thousands of passe… TUR 1948 Cons… Transp… 4.9 e1
5 ag_land Land agricultural … TUN 2013 Non-… Agricu… 9.94e3
6 tv Television sets NIC 1981 Cons… Commun… 1.14e5
# … with abbreviated variable name ¹​category
# Adding continent and country name columns/variables to the sample subset,
# using the countrycode::countrycode() function
technology_sample <- technology_sample %>%
mutate(continent = countrycode(iso3c, origin = "iso3c",
destination = "continent"),
country = countrycode(iso3c, origin = "iso3c", destination = "country.name"))
# Selecting the country ISO code, continent and country name of the sample
# subset, to confirm that countrycode() worked as intended
technology_sample %>% select(iso3c, continent, country)
# A tibble: 6 Ă— 3
iso3c continent country
<chr> <chr> <chr>
1 PRY Americas Paraguay
2 LBR Africa Liberia
3 CHE Europe Switzerland
4 TUR Asia Turkey
5 TUN Africa Tunisia
6 NIC Americas Nicaragua
# Adding the corresponding continent for each country in tt$technology;
# filtering to select for the "Energy" category; adding a more succinct
# "energy_type" variable; and dropping rows with missing values
energy_tbl <- tt$technology %>%
mutate(continent = countrycode(iso3c, origin = "iso3c",
destination = "continent")) %>%
filter(category == "Energy") %>%
mutate(energy_type = fct_recode(variable,
"Consumption" = "elec_cons", "Coal" = "elec_coal", "Gas" = "elec_gas",
"Hydro" = "elec_hydro", "Nuclear" = "elec_nuc", "Oil" = "elec_oil",
"Other renewables" = "elec_renew_other", "Solar" = "elec_solar",
"Wind" = "elec_wind", "Output" = "elecprod",
"Capacity" = "electric_gen_capacity")) %>%
# Printing a summary of energy_tbl
# A tibble: 66,300 Ă— 9
variable label iso3c year group categ…¹ value conti…² energ…³
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <fct>
1 elec_coal Electric… ABW 2000 Prod… Energy 0 Americ… Coal
2 elec_coal Electric… ABW 2001 Prod… Energy 0 Americ… Coal
3 elec_coal Electric… ABW 2002 Prod… Energy 0 Americ… Coal
4 elec_coal Electric… ABW 2003 Prod… Energy 0 Americ… Coal
5 elec_coal Electric… ABW 2004 Prod… Energy 0 Americ… Coal
6 elec_coal Electric… ABW 2005 Prod… Energy 0 Americ… Coal
7 elec_coal Electric… ABW 2006 Prod… Energy 0 Americ… Coal
8 elec_coal Electric… ABW 2007 Prod… Energy 0 Americ… Coal
9 elec_coal Electric… ABW 2008 Prod… Energy 0 Americ… Coal
10 elec_coal Electric… ABW 2009 Prod… Energy 0 Americ… Coal
# … with 66,290 more rows, and abbreviated variable names ¹​category,
# ²​continent, ³​energy_type
# â„ą Use `print(n = ...)` to see more rows
# Filtering energy_table for fossil fuel rows
fossil_fuel_tbl <- energy_tbl %>%
filter(energy_type != "Consumption" & energy_type != "Output"
& energy_type != "Capacity") %>%
filter(energy_type == "Coal" | energy_type == "Gas" | energy_type == "Oil")
# Printing a summary of the tibble
# A tibble: 13,914 Ă— 9
variable label iso3c year group categ…¹ value conti…² energ…³
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <fct>
1 elec_coal Electric… ABW 2000 Prod… Energy 0 Americ… Coal
2 elec_coal Electric… ABW 2001 Prod… Energy 0 Americ… Coal
3 elec_coal Electric… ABW 2002 Prod… Energy 0 Americ… Coal
4 elec_coal Electric… ABW 2003 Prod… Energy 0 Americ… Coal
5 elec_coal Electric… ABW 2004 Prod… Energy 0 Americ… Coal
6 elec_coal Electric… ABW 2005 Prod… Energy 0 Americ… Coal
7 elec_coal Electric… ABW 2006 Prod… Energy 0 Americ… Coal
8 elec_coal Electric… ABW 2007 Prod… Energy 0 Americ… Coal
9 elec_coal Electric… ABW 2008 Prod… Energy 0 Americ… Coal
10 elec_coal Electric… ABW 2009 Prod… Energy 0 Americ… Coal
# … with 13,904 more rows, and abbreviated variable names ¹​category,
# ²​continent, ³​energy_type
# â„ą Use `print(n = ...)` to see more rows
# Filtering energy_table for low-carbon energy source rows
low_carbon_tbl <- energy_tbl %>%
filter(energy_type != "Consumption" & energy_type != "Output"
& energy_type != "Capacity") %>%
filter(energy_type != "Coal" & energy_type != "Gas" & energy_type != "Oil")
# Printing a summary of the tibble
# A tibble: 26,890 Ă— 9
variable label iso3c year group categ…¹ value conti…² energ…³
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <fct>
1 elec_hydro Electri… ABW 2000 Prod… Energy 0 Americ… Hydro
2 elec_hydro Electri… ABW 2001 Prod… Energy 0 Americ… Hydro
3 elec_hydro Electri… ABW 2002 Prod… Energy 0 Americ… Hydro
4 elec_hydro Electri… ABW 2003 Prod… Energy 0 Americ… Hydro
5 elec_hydro Electri… ABW 2004 Prod… Energy 0 Americ… Hydro
6 elec_hydro Electri… ABW 2005 Prod… Energy 0 Americ… Hydro
7 elec_hydro Electri… ABW 2006 Prod… Energy 0 Americ… Hydro
8 elec_hydro Electri… ABW 2007 Prod… Energy 0 Americ… Hydro
9 elec_hydro Electri… ABW 2008 Prod… Energy 0 Americ… Hydro
10 elec_hydro Electri… ABW 2009 Prod… Energy 0 Americ… Hydro
# … with 26,880 more rows, and abbreviated variable names ¹​category,
# ²​continent, ³​energy_type
# â„ą Use `print(n = ...)` to see more rows
# Plotting distributions of electricity produced from fossil fuels
fossil_fuel_tbl %>%
ggplot(aes(x = fct_reorder(energy_type, value), y = value, fill = energy_type)) +
geom_boxplot() +
theme_solarized() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
scale_colour_discrete() +
scale_y_log10() +
facet_wrap(~continent, scales = "free") +
title = "Electricity generated from fossil fuels by continent",
y = "Output in log terawatt-hours: log10(TWh)",
x = "Source")
# Plotting distributions of electricity produced from low-carbon sources
low_carbon_tbl %>%
ggplot(aes(x = fct_reorder(energy_type, value), y = value, fill = energy_type)) +
geom_boxplot() +
theme_solarized() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
scale_colour_discrete() +
scale_y_log10() +
facet_wrap(~continent, scales = "free") +
title = "Electricity generated from low-carbon sources by continent",
y = "Output in log terawatt-hours: log10(TWh)",
x = "Source")
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rnnh/TidyTuesday/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Harrington (2022, July 21). Ronan's #TidyTuesday blog: Adding continent and country names with {countrycode}, and subsetting a data frame using sample(). Retrieved from https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption/
BibTeX citation
@misc{harrington2022adding, author = {Harrington, Ronan}, title = {Ronan's #TidyTuesday blog: Adding continent and country names with {countrycode}, and subsetting a data frame using sample()}, url = {https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption/}, year = {2022} }