This week we will continue with our exploration of R, and introduce the concept of Tidy Data. Our emphasis will be on importing, manuipulating, and summarizing data.
- Concept: Tidy data.
- What is it?
- Contrast with messy data
- Clear code, named variables
- Dos: One piece of info per box
- Do not dos: Commas, colours, mixed data, and other no-nos
- Wide format vs. long-format data
- Intro to dplyr
- R Exercises:
- Importing and summarizing data
- Basic manipulation with dplyr
Wickham, Hadley (2014). “Tidy data”. In: Journal of Statistical Software 59.1, pp. 1–23. DOI:10.18637/jss.v059.i10.
Also, chapters 11 and 12 of:
Wickham, Hadley and Garrett Grolemund (2017). R for data science: visualize, model, transform, tidy, and import data. O’Reilly Media, p. 518. ISBN: 978-1491910399. http://r4ds.had.co.nz/index.html
Finally, see this cheat sheet as a reference: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
Details - Week 3
We will look at three tables with messy data (see lecture notes) and, in groups, will redraw them by hand to be tidy. Don’t worry about error correction - today is all about reshaping data.
After that, download the sample code and data. We will run through the code and explore how it works.
Code and Data
Slides available via speakerdeck