Skip to content

[dark-data-agent-chat] Incomplete dataset: only 207/400 PDFs available + Data Scan failing #2149

@andressalsoares

Description

@andressalsoares

I recently came across the codelab "Turn Dark Data into Structured Gold" (https://codelabs.developers.google.com/dark-data-agent-chat) and have been working on reproducing it.

I would like to flag two issues I encountered:

1. Incomplete dataset: The codelab references 400 PDF files across the recipes and suppliers folders, but the GitHub repository (GoogleCloudPlatform/next-26-keynotes) currently contains only 207 files — 100 in recipes and 107 in suppliers. Could you clarify whether the remaining files are expected to be available, or if there is an alternative source for the complete dataset?

2. Data Scan failing: Every time I run the Data Scan (data-scan-froyo) on the froyo_data_datalakers BigQuery dataset, it completes with status "Succeeded with errors" and scans 0 files and 0 bytes. No tables, filesets, or any other resources are created or updated. I was unable to identify the root cause from the logs.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions