Skip to content

Commit 24bbee0

Browse files
Added clarification on acqusition/instrument dependencies
1 parent beca517 commit 24bbee0

2 files changed

Lines changed: 38 additions & 2 deletions

File tree

docs/source/acquire_upload/acquire_data.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,20 @@ Rigs are responsible for generating the [acquisition.json](https://aind-data-sch
2424

2525
If you can't generate your aind-data-schema formatted metadata on your rig, you can use what we call the “extractor/mapper” pattern. We refer to the code on the rig that extracts metadata from data files as the extractor. We prefer for you to maintain this code in [aind-metadata-extractor](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/) but you can also maintain it yourself. The code that takes the extractor output and transforms it to aind-data-schema is called the mapper. Scientific computing will help develop the mapper as well as maintain it, you are responsible for your extractor. The key to the extractor/mapper pattern is the data contract that defines the extractor output. The data contract must be a pydantic model or JSON schema file and must live in the [aind_metadata_extractor.models](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/tree/main/src/aind_metadata_extractor/models) module.
2626

27-
On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper.
27+
On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper.
28+
29+
#### Relationship between acquisition.json and instrument.json
30+
31+
The acquisition and instrument metadata files are tightly coupled. The [instrument.json](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) describes the full set of devices in your instrument (each device has a `name` field). The [acquisition.json](https://aind-data-schema.readthedocs.io/en/latest/acquisition.html) describes what was active during a specific session.
32+
33+
**Device name matching requirement**: Every device name listed in `acquisition.json` must exist in either the instrument or procedures metadata:
34+
35+
- **DataStream.active_devices**: Each data stream lists the devices that were acquiring data. These names must match the `name` field of devices in `instrument.json` (or implanted devices in `procedures.json`).
36+
- **StimulusEpoch.active_devices**: Similarly, stimulus epoch device names must match instrument or procedure device names.
37+
- **Connections**: Any `source_device` or `target_device` in acquisition connections must reference devices defined in the instrument or procedures.
38+
- **instrument_id**: The `acquisition.instrument_id` must match `instrument.instrument_id`.
39+
40+
Validation of this relationship occurs during the [GatherMetadataJob](upload_data.md#gathermetadatajob) when metadata is assembled for upload. See [Validation during upload](upload_data.md#validation-during-upload) for when validation runs, what happens when it fails, and how to fix issues.
2841

2942
#### Multiple independent rigs
3043

docs/source/acquire_upload/upload_data.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,29 @@ The main settings you should be concerned with are:
1919

2020
The settings for the GatherMetadataJob are typically set [inside of your upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py#L45-L50) or as part of the `job_type`.
2121

22+
### Validation during upload
23+
24+
The GatherMetadataJob validates the relationship between acquisition and instrument metadata when it assembles the full metadata object. This includes checking that:
25+
26+
- All `active_devices` in acquisition data streams and stimulus epochs exist in the instrument (or procedures, for implanted devices)
27+
- All devices referenced in acquisition connections exist in the instrument or procedures
28+
- The `acquisition.instrument_id` matches the `instrument.instrument_id`
29+
30+
**When validation runs**: Validation occurs during the metadata gathering step of the upload job. This runs as part of the aind-data-transfer-service workflow, typically when data is being prepared for transfer (whether from rig to VAST or VAST to S3, depending on your setup).
31+
32+
**If validation fails**:
33+
34+
- With `raise_if_invalid` enabled (strongly recommended): The GatherMetadataJob raises an exception. The upload job fails and no data is transferred. You will see the validation error in the job logs.
35+
- With `raise_if_invalid` disabled: The job may continue and create metadata with a validation bypass, but errors are logged. This can result in a data asset with invalid metadata that may cause problems downstream.
36+
37+
**How to fix validation failures**:
38+
39+
1. **Active devices not found**: Ensure every device name in `acquisition.json` (in `data_streams[].active_devices` and `stimulus_epochs[].active_devices`) exactly matches a device `name` in `instrument.json`. Device names are case-sensitive. If you use implanted devices, those must be defined in `procedures.json`.
40+
2. **instrument_id mismatch**: Set `acquisition.instrument_id` to match `instrument.instrument_id`. When merging multiple instruments, the acquisition should reference the merged instrument_id format (see [Merge rules](#merge-rules)).
41+
3. **Connection device not found**: Ensure `source_device` and `target_device` in each connection match device names in the instrument or procedures.
42+
43+
You can test validation locally before upload using the `InstrumentAcquisitionCompatibility` class from `aind-data-schema`; see the [aind-data-schema validation docs](https://aind-data-schema.readthedocs.io/en/latest/validation.html) for details.
44+
2245
### Merge rules
2346

2447
### When can multiple files be merged?
@@ -33,7 +56,7 @@ Each file must follow the naming pattern `<metadata_type>*.json` where `*` is an
3356

3457
#### Contraints
3558

36-
1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the string combination of the IDs of the unique instruments,
59+
1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the individual IDs joined with `_` in alphabetical order. Because `acquisition.instrument_id` must match the merged instrument, you must anticipate this format when generating acquisition metadata for multi-instrument sessions. For example, if you acquire across behavior instrument "FRG.10-A" and fiber photometry instrument "FIP-2", the merged instrument_id will be `FIP-2_FRG.10-A` (alphabetically sorted). Your acquisition files must use that value for `instrument_id`.
3760

3861
2. **No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones.
3962

0 commit comments

Comments
 (0)