Conversation
| # Find data | ||
|
|
||
| The [data portal](https://data.allenneuraldynamics.org/assets) is a tool for finding and exploring data assets. Currently, you can search all assets that have V2 metadata and easily click links to go to the Code Ocean data asset, metadata, and QC report. | ||
| Each raw asset uploaded from a platform at AIND produces a group of derived assets, one per modality. You can find these assets easily by performing a query on our metadata database using your project name and other fields unique to your project. **All analyses at AIND should begin with a query that returns a group of data assets, filtered by passing quality control**. |
There was a problem hiding this comment.
This organization is true for phys/behavior. Not for other modalities.
For some spim, I think it's just one derived asset which is fine because it's one modality
But for other spim, I think there are many different dervied assets that have more to do with clustering results in time.
We maybe can just get rid of the first sentence and start with "you can find data assets by performing a query"
|
|
||
| ## Query DocDB | ||
|
|
||
| DocDB queries are dictionaries (key-value pairs) that return a set of data assets. Analysis pipelines are required to use a query as the first step in gathering data for analysis **and** to filter assets according to passing quality control criteria. We recommend using the MCP server to gain familiarity with the patterns used for creating queries. |
There was a problem hiding this comment.
I actually think we need more information here.
It's a MongoDB query that uses a particular language/organization. These can be run in Python.
Helen probably can point us to some resources to direct people to, but I do think we want the last line of using the MCP to develop the queries is important.
There was a problem hiding this comment.
oh, how is this meant to be different from the aind-data-access-api? I think I'm conflating the two - where/how would one do DocDB queries separate from the aind-data-access-api?
The query I anticipate for analysis workflows is using the aind-data-access-api, is that not true?
There was a problem hiding this comment.
I cleaned it up, hopefully it makes more sense now
|
|
||
| DocDB queries are dictionaries (key-value pairs) that return a set of data assets. Analysis pipelines are required to use a query as the first step in gathering data for analysis **and** to filter assets according to passing quality control criteria. We recommend using the MCP server to gain familiarity with the patterns used for creating queries. | ||
|
|
||
| ### AI (MCP Server) |
There was a problem hiding this comment.
I'd title this as "MCP Server (AI)"
|
|
||
| ### Fast queries through the cache | ||
|
|
||
| Metadata queries to the database can be very slow. The [`zombie-squirrel`](https://github.com/AllenNeuralDynamics/zombie-squirrel/) package exposes a cache of some fields in the V2 metadata making them available with much lower latency. The metadata cache is updated at midnight, do not use it if you need immediate access to assets. |
There was a problem hiding this comment.
I'd put the last sentence as a paranthetical note.
There was a problem hiding this comment.
Is it possible to list the fields that it caches so people know what they can use this for?
There was a problem hiding this comment.
The tables are listed in the readme, I linked there. I'll also update the readme so it has more information about what fields are cached.
| qc_df = qc(subject_id=subject_id) | ||
| if qc_df.empty or "status" not in qc_df.columns: | ||
| continue | ||
| for _, row in subject_assets.iterrows(): |
There was a problem hiding this comment.
I feel like there's an easier way to just ask if all metrics are status==Pass?
I'm fine with this example, but it feels complicated in a way that might overwhelm people. But not a deal breaker for me.
saskiad
left a comment
There was a problem hiding this comment.
two small comments - one is a typo, the other a small suggestion that you are free to ignore.
No description provided.