Week 4: Introduction to Tidy Data

This week we will continue with our exploration of R, and introduce the concept of Tidy Data. Our emphasis will be on importing, manuipulating, and summarizing data.

Lecture Topics

Concept: Tidy data.
- What is it?
- Contrast with messy data
Clear code, named variables
Dos: One piece of info per box
Do not dos: Commas, colours, mixed data, and other no-nos
Wide format vs. long-format data
Intro to dplyr

In-class Activities

R Exercises:
- Importing and summarizing data
- Basic manipulation in the Tidyverse

Readings

Please read up to page 11 of:
Wickham, Hadley (2014). “Tidy data”. In: Journal of Statistical Software 59.1, pp. 1–23. DOI:10.18637/jss.v059.i10.

Recommended resources

See chapters 11 + 12 of:

Wickham, Hadley and Garrett Grolemund (2017). R for data science: visualize, model, transform, tidy, and import data. O’Reilly Media, p. 518. ISBN: 978-1491910399. http://r4ds.had.co.nz/index.html

Finally, see this cheat sheet as a reference: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

The newer version of the cheat sheet is here: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf

Also:

McGill, B. (2017). Ten commandments for good data management: https://dynamicecology.wordpress.com/2016/08/22/ten-commandments-for-good-data-management/

How to measure cloud cover

Details - Week 4

Activities

We will look at three tables with messy data (see lecture notes) and, in groups, will redraw them by hand to be tidy. Don’t worry about error correction - today is all about reshaping data.

After that, download the sample code and data. We will run through the code and explore how it works.

Code and Data

Week 4 project folder

Lecture Slides

Slides available via speakerdeck