I recently came across the codelab "Turn Dark Data into Structured Gold" (https://codelabs.developers.google.com/dark-data-agent-chat) and have been working on reproducing it.
I would like to flag two issues I encountered:
1. Incomplete dataset: The codelab references 400 PDF files across the recipes and suppliers folders, but the GitHub repository (GoogleCloudPlatform/next-26-keynotes) currently contains only 207 files — 100 in recipes and 107 in suppliers. Could you clarify whether the remaining files are expected to be available, or if there is an alternative source for the complete dataset?
2. Data Scan failing: Every time I run the Data Scan (data-scan-froyo) on the froyo_data_datalakers BigQuery dataset, it completes with status "Succeeded with errors" and scans 0 files and 0 bytes. No tables, filesets, or any other resources are created or updated. I was unable to identify the root cause from the logs.
I recently came across the codelab "Turn Dark Data into Structured Gold" (https://codelabs.developers.google.com/dark-data-agent-chat) and have been working on reproducing it.
I would like to flag two issues I encountered:
1. Incomplete dataset: The codelab references 400 PDF files across the
recipesandsuppliersfolders, but the GitHub repository (GoogleCloudPlatform/next-26-keynotes) currently contains only 207 files — 100 inrecipesand 107 insuppliers. Could you clarify whether the remaining files are expected to be available, or if there is an alternative source for the complete dataset?2. Data Scan failing: Every time I run the Data Scan (
data-scan-froyo) on thefroyo_data_datalakersBigQuery dataset, it completes with status "Succeeded with errors" and scans 0 files and 0 bytes. No tables, filesets, or any other resources are created or updated. I was unable to identify the root cause from the logs.