Week 6: Working With Messy Data

October 16 and 17, 2017

Most of the datasets that fisheries scientists work with have problems.

In this week, we will focus on working with imperfect datasets, and discuss how to identify and reproducibly correct common entry errors in data.

Lecture Topics

  • Identifying common problems:
    • Mixed data types
    • Impossible values
    • Hard-to-detect issues (blanks, numbers with “e”)
    • Data entry errors (typos, big or small numbers)
  • Data exploration as part of the verification workflow


The data wrangling cheat sheet is critical to today’s activity.

Today’s data are adapted from content created by Derek Ogle. His FishR website is fantastic, and I strongly recommend taking a look at it (and checking out his book).

In-class Activities

  • Clean up a dataset

Code and Data

Lecture slides

Slides available via speakerdeck