Week 5: Working With Messy Data

Most of the datasets that fisheries scientists work with have problems.

In this week we will focus on working with imperfect datasets, and discuss how to identify and reproducibly correct common entry errors in data.

Lecture Topics

  • Identifying common problems:
    • Mixed data types
    • Impossible values
    • Hard-to-detect issues (blanks, numbers with “e”)
    • Data entry errors (typos, big or small numbers)
  • Data exploration as part of the verification workflow

Resources

The Data Import and Transformation cheat sheets are critical to today’s activity. I enclosed both in the /resources subfolder of this week’s Project.

Today’s data are adapted from content created by Derek Ogle. His FishR website is fantastic, and I strongly recommend taking a look at it (and checking out his book).

In-class Activities

  • Clean up a dataset

Code and Data

Lecture slides

Slides available via speakerdeck