π β Break data out of PDF prison
This walkthrough demonstrates how to:
- Scrape data from PDF tables using
tabulizer - Manage unwieldy header types and tidy scraped data output using
dplyr,tidyr, andstringr - Abstract steps into a scraper function
- Iterate across multiple tables and PDFs with
purrr - Reshape and bind output into a master
tidydataframe