<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:distill="https://distill.pub/journal/" version="2.0">
  <channel>
    <title>Ronan's #TidyTuesday blog</title>
    <link>https://tidytuesday.netlify.app/</link>
    <atom:link href="https://tidytuesday.netlify.app/index.xml" rel="self" type="application/rss+xml"/>
    <description>Ronan's #TidyTuesday blog. Visualisations and analysis of various data sets
created by the R for Data Science (R4DS) community
</description>
    <image>
      <title>Ronan's #TidyTuesday blog</title>
      <url>https://tidytuesday.netlify.app/images/favicon.png</url>
      <link>https://tidytuesday.netlify.app/</link>
    </image>
    <generator>Distill</generator>
    <lastBuildDate>Thu, 21 Jul 2022 00:00:00 +0000</lastBuildDate>
    <item>
      <title>Adding continent and country names with {countrycode}, and subsetting a data frame using sample()</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption</link>
      <description>


&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, the &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-19/readme.md"&gt;Technology Adoption&lt;/a&gt; data set is used to illustrate data exploration &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt; and adding information using the &lt;a href="https://cran.rstudio.com/web/packages/countrycode/"&gt;{countrycode}&lt;/a&gt; package. During data exploration, the &lt;code&gt;tt$technology&lt;/code&gt; data set is filtered to select for the “Energy” category, and the distinct values for “variable” and “label” are printed. A subset is then created to test adding full country names and corresponding continents based on 3 letter ISO codes in the data set using the &lt;code&gt;countrycode()&lt;/code&gt; function. The full data set is then wrangled into two tibbles for fossil fuel and low-carbon electricity production: the distribution for each energy source is plotted according to the corresponding continent. The full source for this blog post is &lt;a href="https://github.com/rnnh/TidyTuesday"&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup&lt;/h2&gt;
&lt;p&gt;Loading the &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt; libraries and &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-19/readme.md"&gt;data set&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Loading libraries
library(tidytuesdayR)
library(countrycode)
library(tidyverse)
library(ggthemes)

# Loading data
tt &amp;lt;- tt_load(&amp;quot;2022-07-19&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;
    Downloading file 1 of 1: `technology.csv`&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="exploring-tttechnology-selecting-distinct-values-after-filtering-and-testing-adding-a-continent-variable"&gt;Exploring tt$technology: selecting distinct values after filtering, and testing adding a “continent” variable&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of tt$technology
tt$technology&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 491,636 × 7
   variable label                      iso3c  year group categ…¹ value
   &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;                      &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt;
 1 BCG      % children who received a… AFG    1982 Cons… Vaccin…    10
 2 BCG      % children who received a… AFG    1983 Cons… Vaccin…    10
 3 BCG      % children who received a… AFG    1984 Cons… Vaccin…    11
 4 BCG      % children who received a… AFG    1985 Cons… Vaccin…    17
 5 BCG      % children who received a… AFG    1986 Cons… Vaccin…    18
 6 BCG      % children who received a… AFG    1987 Cons… Vaccin…    27
 7 BCG      % children who received a… AFG    1988 Cons… Vaccin…    40
 8 BCG      % children who received a… AFG    1989 Cons… Vaccin…    38
 9 BCG      % children who received a… AFG    1990 Cons… Vaccin…    30
10 BCG      % children who received a… AFG    1991 Cons… Vaccin…    21
# … with 491,626 more rows, and abbreviated variable name ¹​category
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing the distinct &amp;quot;variable&amp;quot; and &amp;quot;label&amp;quot; pairs for the &amp;quot;Energy&amp;quot; category
## This will be used as a reference to create the &amp;quot;energy_type&amp;quot; column/variable
tt$technology %&amp;gt;% filter(category == &amp;quot;Energy&amp;quot;) %&amp;gt;% select(variable, label) %&amp;gt;%
  distinct()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 11 × 2
   variable              label                                        
   &amp;lt;chr&amp;gt;                 &amp;lt;chr&amp;gt;                                        
 1 elec_coal             Electricity from coal (TWH)                  
 2 elec_cons             Electric power consumption (KWH)             
 3 elec_gas              Electricity from gas (TWH)                   
 4 elec_hydro            Electricity from hydro (TWH)                 
 5 elec_nuc              Electricity from nuclear (TWH)               
 6 elec_oil              Electricity from oil (TWH)                   
 7 elec_renew_other      Electricity from other renewables (TWH)      
 8 elec_solar            Electricity from solar (TWH)                 
 9 elec_wind             Electricity from wind (TWH)                  
10 elecprod              Gross output of electric energy (TWH)        
11 electric_gen_capacity Electricity Generating Capacity, 1000 kilowa…&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Setting a seed to make results reproducible
set.seed(&amp;quot;20220719&amp;quot;)
# Using sample() to select six rows of tt$technology at random
sample_rows &amp;lt;- sample(x = rownames(tt$technology), size = 6)
# Creating a subset using the random rows
technology_sample &amp;lt;- tt$technology[sample_rows, ]
# Printing a summary of the randomly sampled subset
technology_sample&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 6 × 7
  variable        label               iso3c  year group categ…¹  value
  &amp;lt;chr&amp;gt;           &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt;
1 Pol3            % children who rec… PRY    1993 Cons… Vaccin… 6.6 e1
2 pct_ag_ara_land % Arable land shar… LBR    1991 Non-… Agricu… 3.08e1
3 fert_total      Aggregate kg of fe… CHE    1988 Prod… Agricu… 1.78e8
4 railp           Thousands of passe… TUR    1948 Cons… Transp… 4.9 e1
5 ag_land         Land agricultural … TUN    2013 Non-… Agricu… 9.94e3
6 tv              Television sets     NIC    1981 Cons… Commun… 1.14e5
# … with abbreviated variable name ¹​category&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Adding continent and country name columns/variables to the sample subset,
# using the countrycode::countrycode() function
technology_sample &amp;lt;- technology_sample %&amp;gt;%
  mutate(continent = countrycode(iso3c, origin = &amp;quot;iso3c&amp;quot;,
    destination = &amp;quot;continent&amp;quot;),
    country = countrycode(iso3c, origin = &amp;quot;iso3c&amp;quot;, destination = &amp;quot;country.name&amp;quot;))
# Selecting the country ISO code, continent and country name of the sample
# subset, to confirm that countrycode() worked as intended
technology_sample %&amp;gt;% select(iso3c, continent, country)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 6 × 3
  iso3c continent country    
  &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;      
1 PRY   Americas  Paraguay   
2 LBR   Africa    Liberia    
3 CHE   Europe    Switzerland
4 TUR   Asia      Turkey     
5 TUN   Africa    Tunisia    
6 NIC   Americas  Nicaragua  &lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="wrangling-tttechnology-into-two-electricity-production-tibbles-fossil-fuels-and-low-carbon-sources"&gt;Wrangling tt$technology into two electricity production tibbles: fossil fuels and low-carbon sources&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Adding the corresponding continent for each country in tt$technology;
# filtering to select for the &amp;quot;Energy&amp;quot; category; adding a more succinct
# &amp;quot;energy_type&amp;quot; variable; and dropping rows with missing values
energy_tbl &amp;lt;- tt$technology %&amp;gt;%
  mutate(continent = countrycode(iso3c, origin = &amp;quot;iso3c&amp;quot;,
    destination = &amp;quot;continent&amp;quot;)) %&amp;gt;%
  filter(category == &amp;quot;Energy&amp;quot;) %&amp;gt;%
  mutate(energy_type = fct_recode(variable,
    &amp;quot;Consumption&amp;quot; = &amp;quot;elec_cons&amp;quot;, &amp;quot;Coal&amp;quot; = &amp;quot;elec_coal&amp;quot;, &amp;quot;Gas&amp;quot; = &amp;quot;elec_gas&amp;quot;,
    &amp;quot;Hydro&amp;quot; = &amp;quot;elec_hydro&amp;quot;, &amp;quot;Nuclear&amp;quot; = &amp;quot;elec_nuc&amp;quot;, &amp;quot;Oil&amp;quot; = &amp;quot;elec_oil&amp;quot;,
    &amp;quot;Other renewables&amp;quot; = &amp;quot;elec_renew_other&amp;quot;, &amp;quot;Solar&amp;quot; = &amp;quot;elec_solar&amp;quot;,
    &amp;quot;Wind&amp;quot; = &amp;quot;elec_wind&amp;quot;, &amp;quot;Output&amp;quot; = &amp;quot;elecprod&amp;quot;,
    &amp;quot;Capacity&amp;quot; = &amp;quot;electric_gen_capacity&amp;quot;)) %&amp;gt;%
  drop_na()

# Printing a summary of energy_tbl
energy_tbl&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 66,300 × 9
   variable  label     iso3c  year group categ…¹ value conti…² energ…³
   &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;fct&amp;gt;  
 1 elec_coal Electric… ABW    2000 Prod… Energy      0 Americ… Coal   
 2 elec_coal Electric… ABW    2001 Prod… Energy      0 Americ… Coal   
 3 elec_coal Electric… ABW    2002 Prod… Energy      0 Americ… Coal   
 4 elec_coal Electric… ABW    2003 Prod… Energy      0 Americ… Coal   
 5 elec_coal Electric… ABW    2004 Prod… Energy      0 Americ… Coal   
 6 elec_coal Electric… ABW    2005 Prod… Energy      0 Americ… Coal   
 7 elec_coal Electric… ABW    2006 Prod… Energy      0 Americ… Coal   
 8 elec_coal Electric… ABW    2007 Prod… Energy      0 Americ… Coal   
 9 elec_coal Electric… ABW    2008 Prod… Energy      0 Americ… Coal   
10 elec_coal Electric… ABW    2009 Prod… Energy      0 Americ… Coal   
# … with 66,290 more rows, and abbreviated variable names ¹​category,
#   ²​continent, ³​energy_type
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Filtering energy_table for fossil fuel rows
fossil_fuel_tbl &amp;lt;- energy_tbl %&amp;gt;%
  filter(energy_type != &amp;quot;Consumption&amp;quot; &amp;amp; energy_type != &amp;quot;Output&amp;quot; 
    &amp;amp; energy_type != &amp;quot;Capacity&amp;quot;) %&amp;gt;% 
  filter(energy_type == &amp;quot;Coal&amp;quot; | energy_type == &amp;quot;Gas&amp;quot; | energy_type == &amp;quot;Oil&amp;quot;)

# Printing a summary of the tibble
fossil_fuel_tbl&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 13,914 × 9
   variable  label     iso3c  year group categ…¹ value conti…² energ…³
   &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;fct&amp;gt;  
 1 elec_coal Electric… ABW    2000 Prod… Energy      0 Americ… Coal   
 2 elec_coal Electric… ABW    2001 Prod… Energy      0 Americ… Coal   
 3 elec_coal Electric… ABW    2002 Prod… Energy      0 Americ… Coal   
 4 elec_coal Electric… ABW    2003 Prod… Energy      0 Americ… Coal   
 5 elec_coal Electric… ABW    2004 Prod… Energy      0 Americ… Coal   
 6 elec_coal Electric… ABW    2005 Prod… Energy      0 Americ… Coal   
 7 elec_coal Electric… ABW    2006 Prod… Energy      0 Americ… Coal   
 8 elec_coal Electric… ABW    2007 Prod… Energy      0 Americ… Coal   
 9 elec_coal Electric… ABW    2008 Prod… Energy      0 Americ… Coal   
10 elec_coal Electric… ABW    2009 Prod… Energy      0 Americ… Coal   
# … with 13,904 more rows, and abbreviated variable names ¹​category,
#   ²​continent, ³​energy_type
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Filtering energy_table for low-carbon energy source rows
low_carbon_tbl &amp;lt;- energy_tbl %&amp;gt;%
  filter(energy_type != &amp;quot;Consumption&amp;quot; &amp;amp; energy_type != &amp;quot;Output&amp;quot; 
    &amp;amp; energy_type != &amp;quot;Capacity&amp;quot;) %&amp;gt;% 
  filter(energy_type != &amp;quot;Coal&amp;quot; &amp;amp; energy_type != &amp;quot;Gas&amp;quot; &amp;amp; energy_type != &amp;quot;Oil&amp;quot;)

# Printing a summary of the tibble
low_carbon_tbl&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 26,890 × 9
   variable   label    iso3c  year group categ…¹ value conti…² energ…³
   &amp;lt;chr&amp;gt;      &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;fct&amp;gt;  
 1 elec_hydro Electri… ABW    2000 Prod… Energy      0 Americ… Hydro  
 2 elec_hydro Electri… ABW    2001 Prod… Energy      0 Americ… Hydro  
 3 elec_hydro Electri… ABW    2002 Prod… Energy      0 Americ… Hydro  
 4 elec_hydro Electri… ABW    2003 Prod… Energy      0 Americ… Hydro  
 5 elec_hydro Electri… ABW    2004 Prod… Energy      0 Americ… Hydro  
 6 elec_hydro Electri… ABW    2005 Prod… Energy      0 Americ… Hydro  
 7 elec_hydro Electri… ABW    2006 Prod… Energy      0 Americ… Hydro  
 8 elec_hydro Electri… ABW    2007 Prod… Energy      0 Americ… Hydro  
 9 elec_hydro Electri… ABW    2008 Prod… Energy      0 Americ… Hydro  
10 elec_hydro Electri… ABW    2009 Prod… Energy      0 Americ… Hydro  
# … with 26,880 more rows, and abbreviated variable names ¹​category,
#   ²​continent, ³​energy_type
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="plotting-distributions-of-electricity-produced-from-fossil-fuels-and-low-carbon-sources"&gt;Plotting distributions of electricity produced from fossil fuels and low-carbon sources&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting distributions of electricity produced from fossil fuels
fossil_fuel_tbl %&amp;gt;%
  ggplot(aes(x = fct_reorder(energy_type, value), y = value, fill = energy_type)) +
  geom_boxplot() +
  theme_solarized() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = &amp;quot;none&amp;quot;) +
  scale_colour_discrete() +
  scale_y_log10() +
  facet_wrap(~continent, scales = &amp;quot;free&amp;quot;) +
  labs(
    title = &amp;quot;Electricity generated from fossil fuels by continent&amp;quot;,
    y = &amp;quot;Output in log terawatt-hours: log10(TWh)&amp;quot;,
    x = &amp;quot;Source&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption/technology-adoption_files/figure-html5/fig1-1.png" alt="Box plots of electricity produced from fossil fuels, faceted by continent." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig1)Box plots of electricity produced from fossil fuels, faceted by continent.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting distributions of electricity produced from low-carbon sources
low_carbon_tbl %&amp;gt;%
  ggplot(aes(x = fct_reorder(energy_type, value), y = value, fill = energy_type)) +
  geom_boxplot() +
  theme_solarized() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = &amp;quot;none&amp;quot;) +
  scale_colour_discrete() +
  scale_y_log10() +
  facet_wrap(~continent, scales = &amp;quot;free&amp;quot;) +
  labs(
    title = &amp;quot;Electricity generated from low-carbon sources by continent&amp;quot;,
    y = &amp;quot;Output in log terawatt-hours: log10(TWh)&amp;quot;,
    x = &amp;quot;Source&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption/technology-adoption_files/figure-html5/fig2-1.png" alt="Box plots of electricity produced from low-carbon energy sources, faceted by continent." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig2)Box plots of electricity produced from low-carbon energy sources, faceted by continent.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class="r distill-force-highlighting-css"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;</description>
      <distill:md5>9c70eb9668aacd6fda7e0ab4aadcda3b</distill:md5>
      <guid>https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption</guid>
      <pubDate>Thu, 21 Jul 2022 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2022-07-21-technology-adoption/technology-adoption_files/figure-html5/fig1-1.png" medium="image" type="image/png" width="1728" height="1152"/>
    </item>
    <item>
      <title>How to write a function in R and apply it to a data frame using map functions from {purr}</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2022-07-12-european-flights</link>
      <description>


&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, the &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-12/readme.md"&gt;European Flights&lt;/a&gt; data set is used to illustrate defining a function in &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt; and applying it to a data frame using map functions from &lt;a href="https://www.r-project.org/"&gt;{purr}&lt;/a&gt;. The full source for this blog post is &lt;a href="https://github.com/rnnh/TidyTuesday"&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup&lt;/h2&gt;
&lt;p&gt;Loading the &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt; libraries and &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-12/readme.md"&gt;data set&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Loading libraries
library(tidytuesdayR)
library(tidyverse)
library(tidytext)
library(ggthemes)

# Loading data
tt &amp;lt;- tt_load(&amp;quot;2022-07-12&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;
    Downloading file 1 of 1: `flights.csv`&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="defining-a-function-to-tidy-flight-types-and-applying-it-with-purrmap"&gt;Defining a function to tidy flight types and applying it with purr::map&lt;/h2&gt;
&lt;p&gt;In this section, we want to tidy the different types of flight in the data set by increasing the number of rows and decreasing the number of columns. For a given airport on a given day, instead of having multiple columns/variables for arrivals, departures and total number of flights, we want to have one column describing the flight type (e.g. arrival or departure) and one column with the value of that flight type/number of flights. This will give the data set a &lt;a href="https://tidyr.tidyverse.org/articles/tidy-data.html"&gt;tidy structure&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the flights data frame
tt$flights&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 688,099 × 14
    YEAR MONTH_NUM MONTH…¹ FLT_DATE            APT_I…² APT_N…³ STATE…⁴
   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;   &amp;lt;dttm&amp;gt;              &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;  
 1  2016 01        JAN     2016-01-01 00:00:00 EBAW    Antwerp Belgium
 2  2016 01        JAN     2016-01-01 00:00:00 EBBR    Brusse… Belgium
 3  2016 01        JAN     2016-01-01 00:00:00 EBCI    Charle… Belgium
 4  2016 01        JAN     2016-01-01 00:00:00 EBLG    Liège   Belgium
 5  2016 01        JAN     2016-01-01 00:00:00 EBOS    Ostend… Belgium
 6  2016 01        JAN     2016-01-01 00:00:00 EDDB    Berlin… Germany
 7  2016 01        JAN     2016-01-01 00:00:00 EDDC    Dresden Germany
 8  2016 01        JAN     2016-01-01 00:00:00 EDDE    Erfurt  Germany
 9  2016 01        JAN     2016-01-01 00:00:00 EDDF    Frankf… Germany
10  2016 01        JAN     2016-01-01 00:00:00 EDDG    Muenst… Germany
# … with 688,089 more rows, 7 more variables: FLT_DEP_1 &amp;lt;dbl&amp;gt;,
#   FLT_ARR_1 &amp;lt;dbl&amp;gt;, FLT_TOT_1 &amp;lt;dbl&amp;gt;, FLT_DEP_IFR_2 &amp;lt;dbl&amp;gt;,
#   FLT_ARR_IFR_2 &amp;lt;dbl&amp;gt;, FLT_TOT_IFR_2 &amp;lt;dbl&amp;gt;, `Pivot Label` &amp;lt;chr&amp;gt;,
#   and abbreviated variable names ¹​MONTH_MON, ²​APT_ICAO, ³​APT_NAME,
#   ⁴​STATE_NAME
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;tt$flights has&amp;quot;, nrow(tt$flights), &amp;quot;rows and&amp;quot;, ncol(tt$flights),
  &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;tt$flights has 688099 rows and 14 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Defining a function to tidy the flights data set
tidy_flights_per_airport &amp;lt;- function(input_flight_type){
  tt$flights %&amp;gt;% 
    # Selecting columns, including the column with the name &amp;quot;input_flight_type&amp;quot;
    ## &amp;quot;all_of()&amp;quot; is used for error handling: if a column with the name matching
    ## &amp;quot;input_flight_type&amp;quot; is not available in tt$flights, the function will return an error
    select(FLT_DATE, APT_NAME, all_of(input_flight_type)) %&amp;gt;% 
    # Adding a &amp;quot;flight_type&amp;quot; column, with &amp;quot;input_flight_type&amp;quot; as a string for each row
    mutate(flight_type = as.character(input_flight_type)) %&amp;gt;% 
    # Renaming the input &amp;quot;input_flight_type&amp;quot; column to &amp;quot;number_of_flights&amp;quot;
    rename(&amp;quot;number_of_flights&amp;quot; = input_flight_type)
}

# Selecting column names with flight types (arrivals, departures, total flights)
flight_types &amp;lt;- colnames(tt$flights)[8:13]
# Printing the flight types
flight_types&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;FLT_DEP_1&amp;quot;     &amp;quot;FLT_ARR_1&amp;quot;     &amp;quot;FLT_TOT_1&amp;quot;     &amp;quot;FLT_DEP_IFR_2&amp;quot;
[5] &amp;quot;FLT_ARR_IFR_2&amp;quot; &amp;quot;FLT_TOT_IFR_2&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Applying the tidying function to the flight types vector using purr::map()
tidy_flights_list &amp;lt;- map(flight_types, tidy_flights_per_airport)&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="binding-the-tidied-flight-type-rows-into-a-data-frame-with-purrmap_df"&gt;Binding the tidied flight type rows into a data frame with purr::map_df&lt;/h2&gt;
&lt;p&gt;Using the map function in the previous section returned a list of tidied flight types: the “tidy_flights_per_airport()” function was applied to each item in “flight_types” individually, and the resulting tidied flight type was added to “tidy_flights_list”. In this section, the “rbind()” function is applied to “tidy_flights_list” to create a single data frame with all of the tidied flight types.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Binding the tidy version of each flight type by row using purr::map_df
tidy_flights &amp;lt;- map_df(tidy_flights_list, rbind)

# Printing a summary of the tidy flights data frame
tidy_flights&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 4,128,594 × 4
   FLT_DATE            APT_NAME             number_of_flights flight…¹
   &amp;lt;dttm&amp;gt;              &amp;lt;chr&amp;gt;                            &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   
 1 2016-01-01 00:00:00 Antwerp                              4 FLT_DEP…
 2 2016-01-01 00:00:00 Brussels                           174 FLT_DEP…
 3 2016-01-01 00:00:00 Charleroi                           45 FLT_DEP…
 4 2016-01-01 00:00:00 Liège                                6 FLT_DEP…
 5 2016-01-01 00:00:00 Ostend-Bruges                        7 FLT_DEP…
 6 2016-01-01 00:00:00 Berlin - Brandenburg                98 FLT_DEP…
 7 2016-01-01 00:00:00 Dresden                             18 FLT_DEP…
 8 2016-01-01 00:00:00 Erfurt                               1 FLT_DEP…
 9 2016-01-01 00:00:00 Frankfurt                          401 FLT_DEP…
10 2016-01-01 00:00:00 Muenster-Osnabrueck                  3 FLT_DEP…
# … with 4,128,584 more rows, and abbreviated variable name
#   ¹​flight_type
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;tidy_flights has&amp;quot;, nrow(tidy_flights), &amp;quot;rows and&amp;quot;, ncol(tidy_flights),
  &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;tidy_flights has 4128594 rows and 4 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;tidy_flights&lt;/code&gt; data frame is now in a &lt;a href="https://tidyr.tidyverse.org/articles/tidy-data.html"&gt;tidy format&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="plotting-the-distribution-of-arrivals-and-departures-across-the-top-six-airports"&gt;Plotting the distribution of arrivals and departures across the top six airports&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;## Selecting the top 6 airports by total number of flights on the latest flight
## date
top_airports &amp;lt;- tidy_flights %&amp;gt;%
  filter(flight_type == &amp;quot;FLT_TOT_1&amp;quot;) %&amp;gt;%
  filter(FLT_DATE == max(FLT_DATE)) %&amp;gt;%
  slice_max(order_by = number_of_flights, n = 6)

# Changing &amp;quot;flight_type&amp;quot; to a factor with descriptive levels
tidy_flights$flight_type &amp;lt;- as.factor(tidy_flights$flight_type)
levels(tidy_flights$flight_type) &amp;lt;- c(&amp;quot;Arrivals&amp;quot;, &amp;quot;Arrivals (Airport Operator)&amp;quot;,
  &amp;quot;Departures&amp;quot;, &amp;quot;Departures (Airport Operator)&amp;quot;, &amp;quot;Total&amp;quot;, &amp;quot;Total (Airport Operator&amp;quot;)

# Plotting the distribution of arrivals and departures for the top airports
tidy_flights %&amp;gt;%
  filter(APT_NAME %in% top_airports$APT_NAME) %&amp;gt;%
  filter(flight_type %in% c(&amp;quot;Arrivals&amp;quot;, &amp;quot;Departures&amp;quot;)) %&amp;gt;%
  ggplot(aes(x = APT_NAME, y = number_of_flights, colour = flight_type)) +
  geom_boxplot() +
  theme_solarized() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_colour_discrete() +
  labs(title = &amp;quot;Distribution of daily arrivals and depatures across six airports&amp;quot;,
    x = &amp;quot;Airport&amp;quot;, y = &amp;quot;Flights&amp;quot;, colour = &amp;quot;Flight type&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="file125c52ebf5ea_files/figure-html/fig1-1.png" alt="Box plots of daily arrival and depature distribution across top six airports." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig1)Box plots of daily arrival and depature distribution across top six airports.
&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="see-also"&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tidytuesday.netlify.app/posts/2022-07-05-sf-rents/"&gt;Reshaping data using pivot functions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="r distill-force-highlighting-css"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;</description>
      <distill:md5>bb2e4d079263ea30024962f2f421bcf4</distill:md5>
      <guid>https://tidytuesday.netlify.app/posts/2022-07-12-european-flights</guid>
      <pubDate>Tue, 12 Jul 2022 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2022-07-12-european-flights/european-flights_files/figure-html5/fig1-1.png" medium="image" type="image/png" width="1728" height="1152"/>
    </item>
    <item>
      <title>Reshaping data frames using pivot functions from {tidyr} and tally from {dplyr}</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2022-07-05-sf-rents</link>
      <description>


&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, the &lt;a href="https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-07-05"&gt;San Francisco Rentals&lt;/a&gt; data set is used to demonstrate data reshaping in R. This involves changing the number of columns and rows in a data frame to fit a given use case. A data frame is made more tall or narrow by decreasing the number of columns, and wider by increasing the number of columns. The three reshaping methods covered in this article are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#reshaping-a-data-frame-by-summarising-variables"&gt;Making a data frame more narrow by summarising variables using group_by() and tally()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reshaping-a-data-frame-to-make-it-wider-with-the-%7Btidyr%7D-function-pivot-wider"&gt;Making a data frame wider with pivot_wider()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reshaping-a-data-frame-to-make-it-more-narrow-with-the-%7Btidyr%7D-function-pivot-longer"&gt;“Lengthening” a data frame with pivot_longer()&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Data frames created with these methods were used to make two plots:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#plotting-permit-type-counts-per-street-using-a-tidy-data-frame-of-value-counts"&gt;Count of construction permits by type per street&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#plotting-annual-construction-per-san-francisco-county-using-a-data-frame-created-with-pivot-longer"&gt;Annual construction by type per San Francisco county&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setup"&gt;Setup&lt;/h2&gt;
&lt;p&gt;Loading the &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt; libraries and &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-05/readme.md"&gt;data set&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Loading libraries
library(tidytuesdayR)
library(tidyverse)
library(tidytext)
library(ggthemes)

# Loading data
tt &amp;lt;- tt_load(&amp;quot;2022-07-05&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;
    Downloading file 1 of 3: `rent.csv`
    Downloading file 2 of 3: `sf_permits.csv`
    Downloading file 3 of 3: `new_construction.csv`&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="reshaping-a-data-frame-by-summarising-variables"&gt;Reshaping a data frame by summarising variables&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the San Francisco (SF) permits data frame
tt$sf_permits&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 86,103 × 44
   permit_nu…¹ permi…² permi…³ permit_creation_d…⁴ block lot   stree…⁵
         &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;dttm&amp;gt;              &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt;
 1  2000010368       3 additi… 2000-01-03 00:00:00 0113  025         9
 2  2000010353       6 demoli… 2000-01-03 00:00:00 1785  001A     2921
 3  2000010498       3 additi… 2000-01-04 00:00:00 3705  042       865
 4  2000010484       3 additi… 2000-01-04 00:00:00 6540  040       525
 5  2000010480       3 additi… 2000-01-04 00:00:00 0013  013       145
 6  2000010475       3 additi… 2000-01-04 00:00:00 0241  003       600
 7  2000010476       3 additi… 2000-01-04 00:00:00 0230  028         1
 8  2000010474       3 additi… 2000-01-04 00:00:00 0241  026       600
 9  2000010479       3 additi… 2000-01-04 00:00:00 3707  051       685
10 20000104173       3 additi… 2000-01-04 00:00:00 0471  003      3400
# … with 86,093 more rows, 37 more variables:
#   street_number_suffix &amp;lt;chr&amp;gt;, street_name &amp;lt;chr&amp;gt;,
#   street_suffix &amp;lt;chr&amp;gt;, unit &amp;lt;dbl&amp;gt;, unit_suffix &amp;lt;chr&amp;gt;,
#   description &amp;lt;chr&amp;gt;, status &amp;lt;chr&amp;gt;, status_date &amp;lt;dttm&amp;gt;,
#   filed_date &amp;lt;dttm&amp;gt;, issued_date &amp;lt;dttm&amp;gt;, completed_date &amp;lt;dttm&amp;gt;,
#   first_construction_document_date &amp;lt;dttm&amp;gt;,
#   structural_notification &amp;lt;chr&amp;gt;, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;tt$sf_permits has&amp;quot;, nrow(tt$sf_permits), &amp;quot;rows and&amp;quot;, ncol(tt$sf_permits),
  &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;tt$sf_permits has 86103 rows and 44 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Creating a tall/narrow data set of permits per street
permits_per_street &amp;lt;- tt$sf_permits %&amp;gt;%
  # Selecting variables/columns to keep
  select(permit_type_definition, street_name, permit_number) %&amp;gt;%
  # Grouping the permit numbers by type and street name for counting
  group_by(permit_type_definition, street_name) %&amp;gt;%
  # Counting/tallying the number of permits by type per street
  tally()

# Printing a summary of the permits per street data frame
permits_per_street&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 3,053 × 3
# Groups:   permit_type_definition [4]
   permit_type_definition           street_name     n
   &amp;lt;chr&amp;gt;                            &amp;lt;chr&amp;gt;       &amp;lt;int&amp;gt;
 1 additions alterations or repairs 01st          196
 2 additions alterations or repairs 02nd          763
 3 additions alterations or repairs 03rd          778
 4 additions alterations or repairs 04th          338
 5 additions alterations or repairs 05th          223
 6 additions alterations or repairs 06th          347
 7 additions alterations or repairs 07th          199
 8 additions alterations or repairs 08th          252
 9 additions alterations or repairs 08th Ti         1
10 additions alterations or repairs 09th          301
# … with 3,043 more rows
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;permits_per_street has&amp;quot;, nrow(permits_per_street), &amp;quot;rows and&amp;quot;,
  ncol(permits_per_street), &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;permits_per_street has 3053 rows and 3 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="reshaping-a-data-frame-to-make-it-wider-with-the-tidyr-function-pivot-wider"&gt;Reshaping a data frame to make it wider with the {tidyr} function pivot wider&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Creating a wider copy of the permits per street data frame
permits_per_street_wider &amp;lt;- permits_per_street %&amp;gt;%
  # Pivoting the street names wider (creating a column for each street) and
  # selecting the &amp;quot;n&amp;quot; variable for the values in this data frame
  pivot_wider(names_from = street_name, values_from = n)

# Printing the wider permits per street data frame
permits_per_street_wider&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 4 × 1,588
# Groups:   permit_type_definition [4]
  permit_typ…¹ `01st` `02nd` `03rd` `04th` `05th` `06th` `07th` `08th`
  &amp;lt;chr&amp;gt;         &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;  &amp;lt;int&amp;gt;
1 additions a…    196    763    778    338    223    347    199    252
2 demolitions      16     17     72      8      7     17     24     20
3 new constru…      9      9     26      8      3      4     13      2
4 new constru…     NA      3     48      3      4      8      7     13
# … with 1,579 more variables: `08th Ti` &amp;lt;int&amp;gt;, `09th` &amp;lt;int&amp;gt;,
#   `10th` &amp;lt;int&amp;gt;, `11th` &amp;lt;int&amp;gt;, `12th` &amp;lt;int&amp;gt;, `13th` &amp;lt;int&amp;gt;,
#   `13th Ti` &amp;lt;int&amp;gt;, `14th` &amp;lt;int&amp;gt;, `15th` &amp;lt;int&amp;gt;, `16th` &amp;lt;int&amp;gt;,
#   `17th` &amp;lt;int&amp;gt;, `18th` &amp;lt;int&amp;gt;, `19th` &amp;lt;int&amp;gt;, `20th` &amp;lt;int&amp;gt;,
#   `21st` &amp;lt;int&amp;gt;, `22nd` &amp;lt;int&amp;gt;, `23rd` &amp;lt;int&amp;gt;, `24th` &amp;lt;int&amp;gt;,
#   `25th` &amp;lt;int&amp;gt;, `25th North` &amp;lt;int&amp;gt;, `26th` &amp;lt;int&amp;gt;, `27th` &amp;lt;int&amp;gt;,
#   `28th` &amp;lt;int&amp;gt;, `29th` &amp;lt;int&amp;gt;, `2nd` &amp;lt;int&amp;gt;, `30th` &amp;lt;int&amp;gt;, …
# ℹ Use `colnames()` to see all variable names&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;permits_per_street_wider has&amp;quot;, nrow(permits_per_street_wider), &amp;quot;rows and&amp;quot;,
  ncol(permits_per_street_wider), &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;permits_per_street_wider has 4 rows and 1588 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="reshaping-a-data-frame-to-make-it-more-narrow-with-the-tidyr-function-pivot-longer"&gt;Reshaping a data frame to make it more narrow with the {tidyr} function pivot longer&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the new construction data frame
tt$new_construction&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 261 × 10
   cartodb_id the_geom the_geom…¹ county  year total…² sfpro…³ mfpro…⁴
        &amp;lt;dbl&amp;gt; &amp;lt;lgl&amp;gt;    &amp;lt;lgl&amp;gt;      &amp;lt;chr&amp;gt;  &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
 1          1 NA       NA         Alame…  1990    3601    2166    1378
 2          2 NA       NA         Alame…  1991     226    -236     395
 3          3 NA       NA         Alame…  1992    2652    2018     563
 4          4 NA       NA         Alame…  1993    3049    2693     282
 5          5 NA       NA         Alame…  1994    2617    2753    -233
 6          6 NA       NA         Alame…  1995    3515    3001     445
 7          7 NA       NA         Alame…  1996    3179    3336    -229
 8          8 NA       NA         Alame…  1997    4591    4414     108
 9          9 NA       NA         Alame…  1998    6022    4484    1465
10         10 NA       NA         Alame…  1999    5601    4131    1392
# … with 251 more rows, 2 more variables: mhproduction &amp;lt;dbl&amp;gt;,
#   source &amp;lt;chr&amp;gt;, and abbreviated variable names
#   ¹​the_geom_webmercator, ²​totalproduction, ³​sfproduction,
#   ⁴​mfproduction
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;tt$new_construction has&amp;quot;, nrow(tt$new_construction), &amp;quot;rows and&amp;quot;,
  ncol(tt$new_construction), &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;tt$new_construction has 261 rows and 10 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Creating a taller/more narrow subset of production type per county
production_per_county &amp;lt;- tt$new_construction %&amp;gt;%
  # Selecting variables/columns from tt$new_construction
  select(county, year, totalproduction, sfproduction, mfproduction,mhproduction) %&amp;gt;%
  # &amp;quot;Lengthening&amp;quot; the data frame by selecting columns to be pivoted to a longer format
  pivot_longer(cols = c(totalproduction, sfproduction, mfproduction, mhproduction)) %&amp;gt;%
  # Creating a copy of the &amp;quot;name&amp;quot; column to the more descriptive &amp;quot;production_type&amp;quot;, as the
  # pivoted columns all describe types of production, and removing the original &amp;quot;name&amp;quot;
  # column
  mutate(production_type = name, name = NULL) %&amp;gt;%
  # Changing &amp;quot;production_type&amp;quot; from a character to a factor variable, with more
  # descriptive factor levels
  mutate(production_type = fct_recode(production_type,
    &amp;quot;Total&amp;quot; = &amp;quot;totalproduction&amp;quot;, &amp;quot;Single family&amp;quot; = &amp;quot;sfproduction&amp;quot;,
    &amp;quot;Multi family&amp;quot; = &amp;quot;mfproduction&amp;quot;, &amp;quot;Mobile home&amp;quot; = &amp;quot;mhproduction&amp;quot;))

# Printing a summary of the production per county data frame
production_per_county&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 1,044 × 4
   county          year value production_type
   &amp;lt;chr&amp;gt;          &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;fct&amp;gt;          
 1 Alameda County  1990  3601 Total          
 2 Alameda County  1990  2166 Single family  
 3 Alameda County  1990  1378 Multi family   
 4 Alameda County  1990    57 Mobile home    
 5 Alameda County  1991   226 Total          
 6 Alameda County  1991  -236 Single family  
 7 Alameda County  1991   395 Multi family   
 8 Alameda County  1991    67 Mobile home    
 9 Alameda County  1992  2652 Total          
10 Alameda County  1992  2018 Single family  
# … with 1,034 more rows
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Printing a summary of the shape of the data frame
paste(&amp;quot;production_per_county has&amp;quot;, nrow(production_per_county), &amp;quot;rows and&amp;quot;,
  ncol(production_per_county), &amp;quot;columns.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[1] &amp;quot;production_per_county has 1044 rows and 4 columns.&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="plotting-permit-type-counts-per-street-using-a-tidy-data-frame-of-value-counts"&gt;Plotting permit type counts per street using a tidy data frame of value counts&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting the top 20 streets with the total number of each permit category
permits_per_street %&amp;gt;%
  slice_max(order_by = n, n = 20) %&amp;gt;%
  mutate(street_name = reorder_within(street_name, n, permit_type_definition)) %&amp;gt;%
  ggplot(aes(x = n, y = street_name, fill = permit_type_definition)) +
  geom_col(show.legend = FALSE) +
  scale_y_reordered() +
  theme_solarized_2() +
  facet_wrap(~permit_type_definition, ncol = 2, scales = &amp;quot;free&amp;quot;) +
  labs(title = &amp;quot;Count of construction permits by type per street&amp;quot;,
    x = &amp;quot;Tally&amp;quot;, y = &amp;quot;Street name&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src="file125c1089cf82_files/figure-html/fig1-1.png" width="864" /&gt;&lt;/p&gt;
&lt;h2 id="plotting-annual-construction-per-san-francisco-county-using-a-data-frame-created-with-pivot-longer"&gt;Plotting annual construction per San Francisco county using a data frame created with pivot longer&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting the annual construction by type per San Francisco county
production_per_county %&amp;gt;%
  ggplot(aes(x = year, y = value,
    colour = fct_reorder2(production_type, year, value))) +
  geom_line() +
  theme_clean() +
  facet_wrap(~county, scales = &amp;quot;free&amp;quot;) +
  scale_colour_brewer(palette = &amp;quot;Dark2&amp;quot;) +
  scale_x_continuous(breaks =
      seq(min(production_per_county$year), max(production_per_county$year), 8)) +
  geom_vline(xintercept = 2008, linetype = 2, colour = &amp;quot;red&amp;quot;, size = 0.4) +
  labs(colour = &amp;quot;Production type&amp;quot;, x = &amp;quot;Year&amp;quot;, y = &amp;quot;Units&amp;quot;,
    title = &amp;quot;Annual construction by type per San Francisco county&amp;quot;,
    subtitle = &amp;quot;Red vertical line marks 2008&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="file125c1089cf82_files/figure-html/fig2-1.png" alt="In San Francisco county, new construction plateaued in 2008 before plummeting." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig2)In San Francisco county, new construction plateaued in 2008 before plummeting.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class="r distill-force-highlighting-css"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;</description>
      <distill:md5>d660e6adcec97c7684c9cb7c454d8abd</distill:md5>
      <guid>https://tidytuesday.netlify.app/posts/2022-07-05-sf-rents</guid>
      <pubDate>Tue, 05 Jul 2022 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2022-07-05-sf-rents/sf-rents_files/figure-html5/fig2-1.png" medium="image" type="image/png" width="1728" height="1152"/>
    </item>
    <item>
      <title>Text Mining Chocolate Bar Characteristics with {tidytext}</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2022-01-26-chocolate-bar-ratings</link>
      <description>


&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, memorable characteristics of chocolate bars are plotted. These characteristics relate to anything about the bars, e.g. &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-01-18/readme.md"&gt;texture, flavour, overall opinion&lt;/a&gt;. This data set includes the country of cocoa bean origin for each chocolate bar, including “blend” for bars with multiple beans. To create these plots, the data set is filtered to select the six countries of origin with the most chocolate bar characteristics.&lt;/p&gt;
&lt;p&gt;The first plot lists the top fifteen most used characteristics for bars from each country. The countries of origin cannot be distinguished based on this plot, most of the characteristics listed can be applied to all the chocolate bars in the data set. “Sweet” is listed in every plot, for example. To find characteristics that are unique or important to chocolate from each country, &lt;a href="https://www.tidytextmining.com/tfidf.html"&gt;term frequency–inverse document frequency (tf-idf)&lt;/a&gt; can be used to select characteristics, instead of number of occurrences alone. In this case, the characteristics associated with each country are treated as separate documents. The second plot lists the top fifteen most characteristics for the same chocolate bars using tf-idf. From this plot, we can see characteristics that are often associated with one group of chocolate bars, but not the other groups. For example, chocolate bars made using a blend of beans are the most likely to list “poor after taste” as a characteristics, whereas “grape” is a characteristic most likely associated with Peruvian chocolate bars.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup&lt;/h2&gt;
&lt;p&gt;Loading the &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt; libraries and &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-01-18/readme.md"&gt;data set&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Loading libraries
library(tidytuesdayR)
library(tidyverse)
library(ggthemes)
library(tidytext)

# Loading data
tt &amp;lt;- tt_load(&amp;quot;2022-01-18&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;
    Downloading file 1 of 1: `chocolate.csv`&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="data-wrangling"&gt;Data wrangling&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Counting how many times each characteristic is used
country_characteristics &amp;lt;- tt$chocolate %&amp;gt;%
  unnest_tokens(memorable_characteristic,
                most_memorable_characteristics, token = &amp;quot;regex&amp;quot;,
                pattern = &amp;quot;,&amp;quot;, to_lower = TRUE) %&amp;gt;%
  mutate(memorable_characteristic = str_squish(memorable_characteristic)) %&amp;gt;%
  count(country_of_bean_origin, memorable_characteristic, sort = TRUE)

# Counting the total number of characteristics used for each country of origin
total_country_characteristics &amp;lt;- country_characteristics %&amp;gt;%
  group_by(country_of_bean_origin) %&amp;gt;%
  summarise(total = sum(n))

# Joining these data frames
country_characteristics &amp;lt;- left_join(country_characteristics,
                                     total_country_characteristics,
                                     by = &amp;quot;country_of_bean_origin&amp;quot;)

# Finding the six countries of origin with the most characteristics
top_countries &amp;lt;- total_country_characteristics %&amp;gt;%
  slice_max(n = 6, order_by = total) %&amp;gt;%
  select(country_of_bean_origin)

# Filtering the data
country_characteristics &amp;lt;- country_characteristics %&amp;gt;%
  filter(country_of_bean_origin %in% top_countries$country_of_bean_origin) %&amp;gt;%
  select(country_of_bean_origin, memorable_characteristic, n, total)

# Adding tf-idf
country_characteristics &amp;lt;- country_characteristics %&amp;gt;%
  bind_tf_idf(memorable_characteristic,
              country_of_bean_origin, n)

# Printing a summary of the data frame
country_characteristics&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 1,233 × 7
   country_of_bean_origin memorable_…¹     n total     tf   idf tf_idf
   &amp;lt;chr&amp;gt;                  &amp;lt;chr&amp;gt;        &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;  &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;
 1 Venezuela              nutty           84   722 0.116      0      0
 2 Ecuador                floral          72   600 0.12       0      0
 3 Dominican Republic     earthy          35   641 0.0546     0      0
 4 Venezuela              roasty          34   722 0.0471     0      0
 5 Venezuela              creamy          33   722 0.0457     0      0
 6 Peru                   cocoa           28   678 0.0413     0      0
 7 Blend                  sweet           27   443 0.0609     0      0
 8 Madagascar             sour            27   485 0.0557     0      0
 9 Blend                  cocoa           26   443 0.0587     0      0
10 Dominican Republic     cocoa           26   641 0.0406     0      0
# … with 1,223 more rows, and abbreviated variable name
#   ¹​memorable_characteristic
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="plotting-the-most-used-characteristics-in-the-chocolate-bar-summaries"&gt;Plotting the most used characteristics in the chocolate bar summaries&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting the most used characteristics in the chocolate bar summaries
country_characteristics %&amp;gt;%
  group_by(country_of_bean_origin) %&amp;gt;%
  slice_max(n, n = 15, with_ties = FALSE) %&amp;gt;%
  ungroup() %&amp;gt;%
  mutate(country_of_bean_origin = as.factor(country_of_bean_origin),
         memorable_characteristic = reorder_within(memorable_characteristic,
                                                   n,
                                                   country_of_bean_origin)) %&amp;gt;%
  ggplot(aes(n, memorable_characteristic, fill = country_of_bean_origin)) +
  geom_col(show.legend = FALSE) +
  scale_y_reordered() +
  theme_solarized_2() +
  facet_wrap(~country_of_bean_origin, ncol = 2, scales = &amp;quot;free&amp;quot;) +
  labs(title = &amp;quot;Most used Characteristics in Chocolate Bar Summaries&amp;quot;,
       subtitle = &amp;quot;Summaries grouped by cocoa country of origin&amp;quot;,
       x = &amp;quot;Characteristic count&amp;quot;, y = NULL)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="file125c424aa6b0_files/figure-html/fig1-1.png" alt="Characteristics that are most used to describe chocolate bars made using different cocoa beans are plotted." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig1)Characteristics that are most used to describe chocolate bars made using different cocoa beans are plotted.
&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="plotting-the-most-important-characteristics-in-the-chocolate-bar-summaries"&gt;Plotting the most important characteristics in the chocolate bar summaries&lt;/h2&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting the most important characteristics in the chocolate bar summaries
country_characteristics %&amp;gt;%
  group_by(country_of_bean_origin) %&amp;gt;%
  slice_max(tf_idf, n = 15, with_ties = FALSE) %&amp;gt;%
  ungroup() %&amp;gt;%
  mutate(country_of_bean_origin = as.factor(country_of_bean_origin),
         memorable_characteristic = reorder_within(memorable_characteristic,
                                                   tf_idf,
                                                   country_of_bean_origin)) %&amp;gt;%
  ggplot(aes(tf_idf, memorable_characteristic, fill = country_of_bean_origin)) +
  geom_col(show.legend = FALSE) +
  scale_y_reordered() +
  theme_solarized_2() +
  facet_wrap(~country_of_bean_origin, ncol = 2, scales = &amp;quot;free&amp;quot;) +
  labs(title = &amp;quot;Important Characteristics in Chocolate Bar Summaries&amp;quot;,
       subtitle = &amp;quot;Reviews grouped by cocoa country of origin&amp;quot;,
       x = &amp;quot;Term frequency–inverse document frequency (tf-idf)&amp;quot;, y = NULL)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="file125c424aa6b0_files/figure-html/fig2-1.png" alt="Characteristics that are often used to describe chocolate bars made using cocoa from a given country, but not for other chocolate bars, are plotted." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig2)Characteristics that are often used to describe chocolate bars made using cocoa from a given country, but not for other chocolate bars, are plotted.
&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.tidytextmining.com/tfidf.html"&gt;Analyzing word and document frequency: tf-idf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cmdlinetips.com/2020/02/faceting-and-reordering-with-ggplot2/"&gt;Faceting and Reordering with ggplot2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="r distill-force-highlighting-css"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;</description>
      <distill:md5>7e305018853e3fada4dc26f99b9fb872</distill:md5>
      <guid>https://tidytuesday.netlify.app/posts/2022-01-26-chocolate-bar-ratings</guid>
      <pubDate>Wed, 26 Jan 2022 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2022-01-26-chocolate-bar-ratings/chocolate-bar-ratings_files/figure-html5/fig2-1.png" medium="image" type="image/png" width="1728" height="2304"/>
    </item>
    <item>
      <title>Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2022-01-23-bee-colony-losses</link>
      <description>


&lt;h2 id="setup"&gt;Setup&lt;/h2&gt;
&lt;p&gt;Loading the &lt;code&gt;R&lt;/code&gt; libraries and &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-01-11/readme.md"&gt;data set&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Loading libraries
library(geomtextpath) # For adding text to ggplot2 curves
library(tidytuesdayR) # For loading data set
library(ggbeeswarm) # For creating a beeswarm plot
library(tidyverse) # For the ggplot2, dplyr libraries
library(gganimate) # For plot animation
library(ggthemes) # For more ggplot2 themes
library(viridis) # For plot themes

# Loading data set
tt &amp;lt;- tt_load(&amp;quot;2022-01-11&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;
    Downloading file 1 of 2: `colony.csv`
    Downloading file 2 of 2: `stressor.csv`&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="data-wrangling"&gt;Data wrangling&lt;/h2&gt;
&lt;p&gt;In this section, the Bee Colony data is wrangled into two tidy sets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tidied_colony_counts_overall&lt;/code&gt; contains quarterly colony counts for the USA&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tidied_colony_counts_per_state&lt;/code&gt; contains quarterly colony counts for various states within the USA&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To create these sets, the original data is filtered to select for the appropriate states, and the “tidy_colony_data()” function is applied. These sets are tidy as &lt;a href="https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data"&gt;each column is a variable, each row is an observation, and every cell has a single value&lt;/a&gt;. The types of observations in these data sets are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Total colonies&lt;/code&gt;: Bee colonies counted&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Lost&lt;/code&gt;: Bee colonies lost&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Added&lt;/code&gt;: Bee colonies added&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Renovated&lt;/code&gt;: Bee colonies renovated&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Creating subsets of the original bee colony data
colony_counts_overall &amp;lt;- tt$colony %&amp;gt;%
  filter(state == &amp;quot;United States&amp;quot;)

colony_counts_per_state &amp;lt;- tt$colony %&amp;gt;%
  filter(state != &amp;quot;United States&amp;quot; &amp;amp; state != &amp;quot;Other states&amp;quot;)

# Defining a function to tidy bee colony count data, which takes
# &amp;quot;messy_colony_data&amp;quot; as an argument
tidy_colony_data &amp;lt;- function(messy_colony_data){
  # Writing the result of the following piped steps to &amp;quot;tidied_colony_data&amp;quot;
  tidied_colony_data &amp;lt;- messy_colony_data %&amp;gt;%
    # Selecting variables
    select(year, colony_n, colony_lost, colony_added, colony_reno) %&amp;gt;%
    # Dropping rows with missing values
    drop_na() %&amp;gt;%
    # Changing columns to rows
    pivot_longer(!year, names_to = &amp;quot;type&amp;quot;, values_to = &amp;quot;count&amp;quot;) %&amp;gt;%
    # Setting &amp;quot;type&amp;quot; as a factor variable
    mutate(type = factor(type)) %&amp;gt;%
    # Recoding the levels of the &amp;quot;type&amp;quot; factor
    mutate(type = fct_recode(type,
                             &amp;quot;Total colonies&amp;quot; = &amp;quot;colony_n&amp;quot;,
                             &amp;quot;Lost&amp;quot; = &amp;quot;colony_lost&amp;quot;,
                             &amp;quot;Added&amp;quot; = &amp;quot;colony_added&amp;quot;,
                             &amp;quot;Renovated&amp;quot; = &amp;quot;colony_reno&amp;quot;)) %&amp;gt;%
    # Reordering &amp;quot;type&amp;quot; factor levels
    mutate(type = fct_relevel(type,
                              &amp;quot;Total colonies&amp;quot;, &amp;quot;Lost&amp;quot;, &amp;quot;Added&amp;quot;, &amp;quot;Renovated&amp;quot;))
  # Returning &amp;quot;tidied_colony_data&amp;quot;
  return(tidied_colony_data)
}

# Using this function to tidy the subsets
tidied_colony_counts_overall &amp;lt;- tidy_colony_data(colony_counts_overall)

tidied_colony_counts_per_state &amp;lt;- tidy_colony_data(colony_counts_per_state)

# Printing a summary of the subsets before tidying...
colony_counts_overall&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 26 × 10
    year months  state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶
   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
 1  2015 Januar… Unit… 2824610      NA  500020      18  546980  270530
 2  2015 April-… Unit… 2849500      NA  352860      12  661860  692850
 3  2015 July-S… Unit… 3132880      NA  457100      15  172990  303070
 4  2015 Octobe… Unit… 2874760      NA  412380      14  117150  158790
 5  2016 Januar… Unit… 2594590      NA  428800      17  378160  158050
 6  2016 April-… Unit… 2801470      NA  329820      12  736920  561160
 7  2016 July-S… Unit… 3181180      NA  397290      12  217320  282130
 8  2016 Octobe… Unit… 3032060      NA  502350      17  124660   60390
 9  2017 Januar… Unit… 2615590      NA  361850      14  586240  239580
10  2017 April-… Unit… 2886030      NA  225680       8  653470  806170
# … with 16 more rows, 1 more variable: colony_reno_pct &amp;lt;dbl&amp;gt;, and
#   abbreviated variable names ¹​colony_n, ²​colony_max, ³​colony_lost,
#   ⁴​colony_lost_pct, ⁵​colony_added, ⁶​colony_reno
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;colony_counts_per_state&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 1,196 × 10
    year months  state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶
   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
 1  2015 Januar… Alab…    7000    7000    1800      26    2800     250
 2  2015 Januar… Ariz…   35000   35000    4600      13    3400    2100
 3  2015 Januar… Arka…   13000   14000    1500      11    1200      90
 4  2015 Januar… Cali… 1440000 1690000  255000      15  250000  124000
 5  2015 Januar… Colo…    3500   12500    1500      12     200     140
 6  2015 Januar… Conn…    3900    3900     870      22     290      NA
 7  2015 Januar… Flor…  305000  315000   42000      13   54000   25000
 8  2015 Januar… Geor…  104000  105000   14500      14   47000    9500
 9  2015 Januar… Hawa…   10500   10500     380       4    3400     760
10  2015 Januar… Idaho   81000   88000    3700       4    2600    8000
# … with 1,186 more rows, 1 more variable: colony_reno_pct &amp;lt;dbl&amp;gt;, and
#   abbreviated variable names ¹​colony_n, ²​colony_max, ³​colony_lost,
#   ⁴​colony_lost_pct, ⁵​colony_added, ⁶​colony_reno
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;# ...and after tidying
tidied_colony_counts_overall&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 100 × 3
    year type             count
   &amp;lt;dbl&amp;gt; &amp;lt;fct&amp;gt;            &amp;lt;dbl&amp;gt;
 1  2015 Total colonies 2824610
 2  2015 Lost            500020
 3  2015 Added           546980
 4  2015 Renovated       270530
 5  2015 Total colonies 2849500
 6  2015 Lost            352860
 7  2015 Added           661860
 8  2015 Renovated       692850
 9  2015 Total colonies 3132880
10  2015 Lost            457100
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class="r"&gt;&lt;code&gt;tidied_colony_counts_per_state&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# A tibble: 4,208 × 3
    year type           count
   &amp;lt;dbl&amp;gt; &amp;lt;fct&amp;gt;          &amp;lt;dbl&amp;gt;
 1  2015 Total colonies  7000
 2  2015 Lost            1800
 3  2015 Added           2800
 4  2015 Renovated        250
 5  2015 Total colonies 35000
 6  2015 Lost            4600
 7  2015 Added           3400
 8  2015 Renovated       2100
 9  2015 Total colonies 13000
10  2015 Lost            1500
# … with 4,198 more rows
# ℹ Use `print(n = ...)` to see more rows&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="plotting-bee-colony-observations-using-ggbeeswarm"&gt;Plotting Bee Colony observations using {ggbeeswarm}&lt;/h2&gt;
&lt;p&gt;The first graph plots a point for each type of observation using &lt;a href="https://github.com/eclarke/ggbeeswarm"&gt;geom_beeswarm()&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Plotting Bee Colony observations using geom_beeswarm() from {ggbeeswarm}
tidied_colony_counts_per_state %&amp;gt;%
  ggplot(aes(x = type, y = count)) +
  geom_beeswarm(cex = 4, colour = &amp;quot;yellow&amp;quot;) +
  scale_y_log10() +
  theme_solarized_2(light = FALSE) +
  facet_wrap(~type, scales = &amp;quot;free&amp;quot;) +
  theme(legend.position=&amp;quot;none&amp;quot;, axis.text.x = element_blank()) +
  labs(title = &amp;quot;Bee Colonies Counted, Lost, Added, Renovated&amp;quot;,
       subtitle = &amp;quot;Created using {ggbeeswarm}&amp;quot;,
       x = NULL, y = &amp;quot;Number of bee colonies (log10)&amp;quot;,
       fill = NULL)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="file125c2418918b_files/figure-html/fig1-1.png" alt="Scatter plots of bee colony observations. This plot has a point for each observation. Points are jittered to reduce overplotting." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig1)Scatter plots of bee colony observations. This plot has a point for each observation. Points are jittered to reduce overplotting.
&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="animating-bee-colony-observations-over-time"&gt;Animating Bee Colony observations over time&lt;/h2&gt;
&lt;p&gt;While the previous plot is thematically appropriate, it could be better. This graph plots the same points over time in an animation, with the year plotted given in the subtitle. This graph uses standard {ggplot2} &lt;a href="https://ggplot2.tidyverse.org/reference/geom_jitter.html"&gt;jittered points&lt;/a&gt;, as well as a box plot to illustrate the distribution of the points. These box plots have notches, showing 95% confidence intervals for the median. Distributions with notches that do not overlap differ significantly.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Defining an animation showing bee colony counts over time
p &amp;lt;- tidied_colony_counts_per_state %&amp;gt;%
  ggplot(aes(x = count, y = fct_reorder(type, count))) +
  geom_jitter(color = &amp;quot;yellow&amp;quot;, alpha = 0.8) +
  geom_boxplot(width = 0.2, alpha = 0.8, notch = TRUE, colour = &amp;quot;cyan&amp;quot;) +
  scale_x_log10() +
  theme_solarized_2(light = FALSE) +
  theme(legend.position=&amp;quot;none&amp;quot;, axis.ticks.y = element_blank(),
        axis.line.y = element_blank()) +
  transition_time(as.integer(year)) +
  labs(title = &amp;quot;Bee Colonies Counted, Lost, Added, Renovated, per year&amp;quot;,
       subtitle = &amp;quot;Year: {frame_time}&amp;quot;,
       x = &amp;quot;Number of bee colonies (log10)&amp;quot;, y = NULL)

# Rendering the animation as a .gif
animate(p, nframes = 180, start_pause = 20,  end_pause = 20,
        renderer = magick_renderer())&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="../file125c2418918b_files/figure-html/fig2-1.gif" alt="Animation showing bee colony counts from 2015 to 2021."  /&gt;
&lt;p class="caption"&gt;
(#fig:fig2)Animation showing bee colony counts from 2015 to 2021.
&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="plotting-the-distribution-of-different-bee-colony-observation-types"&gt;Plotting the distribution of different Bee Colony observation types&lt;/h2&gt;
&lt;p&gt;From the previous plot, we can see that the &lt;code&gt;Added&lt;/code&gt; and &lt;code&gt;Renovated&lt;/code&gt; variables have similar distributions based on their box plots. Distributions can also be visualised using density plots. In this graph, the distribution of different types of observation in the data set are plotted.&lt;/p&gt;
&lt;pre class="r"&gt;&lt;code&gt;# Creating a density plot for different observation types
tidied_colony_counts_overall %&amp;gt;%
  filter(type != &amp;quot;Total colonies&amp;quot;) %&amp;gt;%
  ggplot(aes(x = count, colour = type, label = type)) +
  geom_textdensity(size = 7, fontface = 2, hjust = 0.89, vjust = 0.3,
                   linewidth = 1.2) +
  theme_solarized_2(light = FALSE) +
  theme(legend.position = &amp;quot;none&amp;quot;) +
  labs(title = &amp;quot;Distribution of Bee Colony Counts&amp;quot;,
       subtitle = &amp;quot;Distributions of Bee Colonies Addded, Renovated, Lost&amp;quot;,
       x = &amp;quot;Number of bee colonies&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class="figure"&gt;
&lt;img src="file125c2418918b_files/figure-html/fig3-1.png" alt="A density plot, giving the distribution of various observations. Of the three types of observation plotted, Added and Renovated are the most similar." width="864" /&gt;
&lt;p class="caption"&gt;
(#fig:fig3)A density plot, giving the distribution of various observations. Of the three types of observation plotted, Added and Renovated are the most similar.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class="r distill-force-highlighting-css"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;</description>
      <distill:md5>15ab9f42454372087f1720713d00d326</distill:md5>
      <guid>https://tidytuesday.netlify.app/posts/2022-01-23-bee-colony-losses</guid>
      <pubDate>Sun, 23 Jan 2022 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2022-01-23-bee-colony-losses/bee-colony-losses_files/figure-html5/fig3-1.png" medium="image" type="image/png" width="1728" height="1152"/>
    </item>
    <item>
      <title>Text mining Star Trek dialogue and classifying characters using machine learning</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-08-18-star-trek-voice-commands</link>
      <description>Graphs, text mining and analysis using the #TidyTuesday data set for week 34 of 2021
  (17/8/2021): "Star Trek voice commands"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-08-18-star-trek-voice-commands</guid>
      <pubDate>Wed, 18 Aug 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-08-18-star-trek-voice-commands/star-trek-voice-commands_files/figure-html5/fig4-1.png" medium="image" type="image/png" width="1728" height="960"/>
    </item>
    <item>
      <title>Adjusting variable distribution and exploring data using mass linear regression</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-08-15-bea-infrastructure-investment</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 33 of 2021
  (10/8/2021): "BEA Infrastructure Investment"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-08-15-bea-infrastructure-investment</guid>
      <pubDate>Sun, 15 Aug 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-08-15-bea-infrastructure-investment/bea-infrastructure-investment_files/figure-html5/figure_1-1.png" medium="image" type="image/png" width="1536" height="960"/>
    </item>
    <item>
      <title>Predicting voluntary CEO departures using machine learning</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-04-27-ceo-departures</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 18 of 2021
  (27/4/2021): "CEO Departures"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-04-27-ceo-departures</guid>
      <pubDate>Tue, 27 Apr 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-04-27-ceo-departures/ceo-departures_files/figure-html5/fig2-1.png" medium="image" type="image/png" width="1536" height="960"/>
    </item>
    <item>
      <title>Films with MPA ratings on Netflix</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-04-21-netflix-titles</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 17 of 2021
  (20/4/2021): "Netflix Titles"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-04-21-netflix-titles</guid>
      <pubDate>Wed, 21 Apr 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-04-21-netflix-titles/netflix-titles_files/figure-html5/figure1-1.png" medium="image" type="image/png" width="1248" height="768"/>
    </item>
    <item>
      <title>Post offices in the USA from 1772 to 2000</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-04-16-us-post-offices</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 16 of 2021
  (13/4/2021): "US post offices"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-04-16-us-post-offices</guid>
      <pubDate>Fri, 16 Apr 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-04-16-us-post-offices/us-post-offices_files/figure-html5/figure2-1.png" medium="image" type="image/png" width="1152" height="960"/>
    </item>
    <item>
      <title>Plotting deforestation and its causes</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-04-07-global-deforestation</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 15 of 2021
  (6/4/2021): "Global deforestation"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-04-07-global-deforestation</guid>
      <pubDate>Wed, 07 Apr 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-04-07-global-deforestation/global-deforestation_files/figure-html5/figure1-1.png" medium="image" type="image/png" width="1536" height="1536"/>
    </item>
    <item>
      <title>Plotting foundations according to shade</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-04-06-makeup-shades</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 14 of 2021
  (30/3/2021): "Makeup Shades"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-04-06-makeup-shades</guid>
      <pubDate>Tue, 06 Apr 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-04-06-makeup-shades/makeup-shades_files/figure-html5/figure1-1.png" medium="image" type="image/png" width="1536" height="1536"/>
    </item>
    <item>
      <title>UN Votes: Plotting votes on United Nations resolutions</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-03-30-un-votes</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 13 of 2021
  (23/3/2021): "UN Votes"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-03-30-un-votes</guid>
      <pubDate>Tue, 30 Mar 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-03-30-un-votes/un-votes_files/figure-html5/figure2-1.png" medium="image" type="image/png" width="1536" height="960"/>
    </item>
    <item>
      <title>Video Games and Sliced</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-03-23-video-games-and-sliced</link>
      <description>Graphs and analysis using the #TidyTuesday data set for week 12 of 2021
  (16/3/2021): "Video Games and Sliced"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-03-23-video-games-and-sliced</guid>
      <pubDate>Tue, 23 Mar 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-03-23-video-games-and-sliced/video-games-and-sliced_files/figure-html5/figure1-1.png" medium="image" type="image/png" width="1152" height="960"/>
    </item>
    <item>
      <title>Bechdel Test</title>
      <dc:creator>Ronan Harrington</dc:creator>
      <link>https://tidytuesday.netlify.app/posts/2021-03-21-bechdel-test</link>
      <description>Graphs using the #TidyTuesday data set for week 11 of 2021 (9/3/2021):
"Bechdel Test"</description>
      <guid>https://tidytuesday.netlify.app/posts/2021-03-21-bechdel-test</guid>
      <pubDate>Sun, 21 Mar 2021 00:00:00 +0000</pubDate>
      <media:content url="https://tidytuesday.netlify.app/posts/2021-03-21-bechdel-test/bechdel-test_files/figure-html5/figure1-1.gif" medium="image" type="image/gif"/>
    </item>
  </channel>
</rss>
