Version: 0.1
Last Updated: 2025-10-11 13:28
| Source Name | URL | Description | File Type | Est. Raw Size | Fields Used | License / Notes |
|---|---|---|---|---|---|---|
| USDA Rural Development Data Gateway | https://data.rd.usda.gov/ | Rural dev funding programs | CSV / API | ~150 MB | state, county, program_name, funding_type, intent_category | Public Domain |
| USDA FSA Farm Loan Dataset | https://data.nal.usda.gov/dataset/farm-loan-programs | Loans by county | CSV | ~100 MB | state, county, program_name, industry | Public Domain |
| Grants.gov Open Data API | https://www.grants.gov/services/open-data.html | Federal grant metadata | JSON / API | ~50 MB | title, agency, deadline | Public API Key Req. |
| U.S. Census ACS 5-Year | https://www.census.gov/data.html | Median income by county | CSV | ~200 MB | state, county, median_income | CC-BY |
| USDA ERS Rural-Urban Continuum Codes | https://www.ers.usda.gov/data-products/rural-urban-continuum-codes/ | Rural/urban classification | CSV | ~20 MB | county_fips, rucc_code | Public Domain |
| BEA Industry Output Summary | https://apps.bea.gov/regional/downloadzip.cfm | Industry output by state | CSV | ~50 MB | state, industry, output_value | Public Domain |
Total Raw Size: ≈ 570 MB Target Cleaned Size: ≤ 0.4 GB
- Keep latest 5 years (2019–2023).
- Aggregate to county (FIPS).
- ≤ 30 columns per file.
- Store as
.parquet(Snappy).
- Download raw files →
/data/raw/(local only). - Record download timestamp & hash here.
- Run
scripts/etl_usda.py --update. - Validate & export →
/data-samples/. - Push docs & samples (≤ 50 MB each).
| Version | Timestamp (CT) | Author | Summary |
|---|---|---|---|
| 0.1 | 2025-10-11 13:45 CT | Victor Olatunji | Initial data source registry |