Most of the datasets that fisheries scientists work with have problems.
In this week we will focus on working with imperfect datasets, and discuss how to identify and reproducibly correct common entry errors in data.
- Identifying common problems:
- Mixed data types
- Impossible values
- Hard-to-detect issues (blanks, numbers with “e”)
- Data entry errors (typos, big or small numbers)
- Data exploration as part of the verification workflow
The data wrangling cheat sheet is critical to today’s activity.
- Clean up a dataset
Code and Data
Slides available via speakerdeck