Socrata tweaks (and a couple of odds and ends)#2084
Conversation
23086d8 to
24ce1f9
Compare
…details of a 311 request
ryanfchase
left a comment
There was a problem hiding this comment.
just the one concern needs to be addressed. I'm not familiar enough with tests enough yet, perhaps someone else could also have a look
| // Create db connection | ||
| const newConn = await newDb.connect(); | ||
|
|
||
| // Create views so tables can be queried as requests_<year> |
There was a problem hiding this comment.
This is 100% the right move to make, but it has some implications when using Huggingface as the dataset source. I believe it will try to download all datasets into memory before the user gets a chance to filter the dataset. This is what the network tab shows on your branch (regardless of what data source is specified in .env):
Click to see 311 map + network tab
(Note: the failed fetches for dataset years that are pre-2024 are likely a separate problem, I'll try and surface that issue elsewhere)
I think ultimately we should be creating the views without actually loading data. We are simply using ... AS SELECT * FROM requestsYYYY.parquet so that we can automatically retrieve the column names. Maybe we can change the query to do that? Or we can try an approach that is independent from Huggingface and simply store the relevant columns (for each year...sigh) in a local file (as a javascript object, or just put a small file in the data folder).
| import moment from "moment"; | ||
| import ddbh from "@utils/duckDbHelpers.js"; | ||
|
|
||
| const dataResources = { |
There was a problem hiding this comment.
Looks like we can add the following js object entries here:
2020: "rq3b-xjk8",
2021: "97z7-y5bt",
2022: "i5ke-k6by",
2023: "4a4x-mna2"
See my other comment for the links to verify: #2084 (comment)

Allow previous years' data when running locally with Socrata as the data source.
Added a unit test for src/utils/DataService.js while I was at it.