class: center, middle, inverse, title-slide # Intro to ggplot2 ### Data Visualization for Social Good
CorrelAid Switzerland
### February 2021 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> https://correlaid.org/correlaid-x/switzerland/ </font> </span> </a> <a href="https://correlaid.org/correlaid-x/switzerland/"> <font color="#7E7E7E"> Data Visualization for Social Good | February 2021 </font> </a> </span> </div> --- .pull-left3[ # Tidyverse <ul> <li class="m1"><span>The tidyverse is...</span></li><br> <ul class="level"> <li><span>A collection of user-friendly <high>packages</high> for analyzing <high>tidy data</high></span></li><br> <li><span>An <high>ecosystem</high> for analytics and data science with common design principles</span></li><br> <li><span>A <high>dialect</high> of the R language</span></li> </ul> </ul> ] .pull-right65[ <br><br> <p align="center"> <img src="image/tidyverse_ggplot.png" height = "520px"> </p> ] --- # Modular graphics in <mono>ggplot2</mono> .pull-left45[ <ul> <li class="m1"><span><highm>data</highm>: the data set</span></li> <li class="m2"><span><highm>mapping</highm>: the plot's structure</span></li> <ul class="level"> <li><span>What do the axes represent?</span></li> <li><span>What do size, shapes, colors, etc. represent?</span></li> </ul> <li class="m3"><span><highm>geoms</highm>: geometric shapes illustrating data</high></span></li> <li class="m4"><span><highm>labs</highm>: Plot annotation</high></span></li> <li class="m5"><span><highm>themes</highm>: Aesthetic details</high></span></li> <li class="m6"><span><highm>facets</highm>: Stratify plot according to variable</high></span></li> <li class="m7"><span><highm>scales</highm>: Scaling of dimensions</high></span></li> </ul> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] --- # `ggplot()` .pull-left45[ <ul> <li class="m1"><span>All plots start with <mono>ggplot()</mono></span></li> <li class="m2"><span>Two arguments</span></li> <ul class="level"> <li><span><mono>data</mono> | The data set (<mono>tibble</mono>)</span></li> <li><span><mono>mapping</mono> | The plot structure. Defined using <mono>aes()</mono> </ul> </span></li> </ul> ```r # averages per year basel_avg <- basel %>% group_by(year) %>% summarize( income_mean = mean(income_mean), income_median = mean(income_median)) ``` ] .pull-right45[ ```r ggplot(data = basel_avg) ``` <img src="Plotting_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] --- # `aes()` .pull-left45[ <ul> <li class="m1"><span><mono>aes()</mono> helps define the structure of the <highm>mapping</highm> Argument.</span></li> <li class="m2"><span>Key arguments:</span></li> <ul class="level"> <li><span><mono>x, y</mono> | Defines axes</span></li> <li><span><mono>color,fill</mono> | Defines colors</span></li> <li><span><mono>alpha</mono> | Defines opacity</span></li> <li><span><mono>size</mono> | Defines sizes</span></li> <li><span><mono>shape</mono> | Defines shapes (e.g., circles or squares)</span></li> </ul> </ul> ] .pull-right45[ ```r ggplot(data = basel_avg, mapping = aes(x = year, y = income_mean)) ``` <img src="Plotting_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- # <mono>+</mono> .pull-left45[ <ul> <li class="m1"><span>The <mono>+</mono> operator "adds" <high>additional elements</high> to the plot.</span></li> <li class="m1"><span>Not to be confused with the pipe <mono>%>%</mono>.</span></li> </ul> <br> ```r ggplot(data = basel_avg, mapping = aes(x = year, y = income_mean)) + # Show as points geom_point() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> <li class="m2"><span>A few examples <mono>geoms</mono>:</span></li> <ul class="level"> <li><span><mono>geom_point()</mono> | for points</span></li> <li><span><mono>geom_line()</mono> | for lines</span></li> <li><span><mono>geom_smooth()</mono> | for smooth curves</span></li> <li><span><mono>geom_bar()</mono> | for bars</span></li> <li><span><mono>geom_boxplot()</mono> | for box-plots </span></li> <li><span><mono>geom_violin()</mono> | for violin-plots </span></li> </ul> </ul> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = basel_avg, mapping = aes(x = year, y = income_mean)) + # Show as lines geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = basel_avg, mapping = aes(x = year, y = income_mean)) + # Show as smoothed curve geom_smooth() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = basel_avg, mapping = aes(x = year, y = income_mean)) + # Show as points and lines geom_point() + geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span>Most <mono>geom_*()</mono> functions allow specification of <highm>data</highm> and <highm>mapping</highm>.</span></li> </ul> <br> ```r ggplot(data = basel_avg, mapping = aes(x = year, y = income_mean)) + geom_point() + geom_line() + # Add points and lines for median geom_point(aes(y = income_median)) + geom_line(aes(y = income_median)) ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] --- # Wrangling .pull-left45[ <ul> <li class="m1"><span>Oftentimes, creating the desired plot requires appropriate data wrangling.</span></li> <li class="m2"><span><mono>ggplot</mono> works best with <high>long data formats</high>.</span></li> </ul> <br> ```r # pivot to long format basel_avg_long <- basel_avg %>% pivot_longer(-year, names_to = "statistic", values_to = "income") ``` ] .pull-right45[ ```r basel_avg_long ``` ``` # A tibble: 34 x 3 year statistic income <dbl> <chr> <dbl> 1 2001 income_mean 63027. 2 2001 income_median 49516. 3 2002 income_mean 63555. 4 2002 income_median 50066. 5 2003 income_mean 63083. 6 2003 income_median 49717. 7 2004 income_mean 62298. 8 2004 income_median 49467. 9 2005 income_mean 63133. 10 2005 income_median 49192. # … with 24 more rows ``` ] --- # <mono>aes()</mono> .pull-left45[ <ul> <li class="m1"><span><mono>aes()</mono> helps define the structure of the <highm>mapping</highm> Argument.</span></li> <br> ```r # use basel_avg_long ggplot(data = basel_avg_long, mapping = aes( x = year, y = income, # add color dimension col = statistic)) + geom_point() + geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] --- # <mono>aes()</mono> .pull-left45[ <ul> <li class="m1"><span><mono>aes()</mono> helps define the structure of the <highm>mapping</highm> Argument.</span></li> <br> ```r # use basel_avg_long ggplot(data = basel_avg_long, mapping = aes( x = year, y = income, # add shape dimension shape = statistic)) + geom_point() + geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- # `facet_*()` .pull-left45[ <ul> <li class="m1"><span>Facetting creates the <high>same plot for groups</high> defined by another variable.</span></li> <li class="m2"><span>Key functions:</span></li> <ul class="level"> <li><span><mono>facet_wrap()</mono></span></li> <li><span><mono>facet_grid()</mono></span></li> </ul> </ul> <br> ```r basel_long <- basel %>% pivot_longer(c(income_mean, income_median), names_to = 'statistic', values_to = 'income') ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] --- .pull-left45[ # `facet_*()` <ul> <li class="m1"><span>Facetting creates the <high>same plot for groups</high> defined by another variable.</span></li> </ul> <br> ```r # use basel_long ggplot(data = basel_long, mapping = aes( x = year, y = income, col = statistic)) + geom_point() + geom_line() + # facet by quarter facet_wrap(~quarter) ``` ] .pull-right45[ <br><br><br> <img src="Plotting_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ] --- class: middle, center <h1><a href="https://correlaidswitzerland.github.io/DataViz4Good/">Schedule</a></h1>