Added clarification on acqusition/instrument dependencies

dougollerenshaw · dougollerenshaw · commit 24bbee03d2ac · 2026-02-17T11:55:10.000-08:00
diff --git a/docs/source/acquire_upload/acquire_data.md b/docs/source/acquire_upload/acquire_data.md
@@ -24,7 +24,20 @@ Rigs are responsible for generating the [acquisition.json](https://aind-data-sch
 
 If you can't generate your aind-data-schema formatted metadata on your rig, you can use what we call the “extractor/mapper” pattern. We refer to the code on the rig that extracts metadata from data files as the extractor. We prefer for you to maintain this code in [aind-metadata-extractor](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/) but you can also maintain it yourself. The code that takes the extractor output and transforms it to aind-data-schema is called the mapper. Scientific computing will help develop the mapper as well as maintain it, you are responsible for your extractor. The key to the extractor/mapper pattern is the data contract that defines the extractor output. The data contract must be a pydantic model or JSON schema file and must live in the [aind_metadata_extractor.models](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/tree/main/src/aind_metadata_extractor/models) module.
 
-On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper. 
+On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper.
+
+#### Relationship between acquisition.json and instrument.json
+
+The acquisition and instrument metadata files are tightly coupled. The [instrument.json](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) describes the full set of devices in your instrument (each device has a `name` field). The [acquisition.json](https://aind-data-schema.readthedocs.io/en/latest/acquisition.html) describes what was active during a specific session.
+
+**Device name matching requirement**: Every device name listed in `acquisition.json` must exist in either the instrument or procedures metadata:
+
+- **DataStream.active_devices**: Each data stream lists the devices that were acquiring data. These names must match the `name` field of devices in `instrument.json` (or implanted devices in `procedures.json`).
+- **StimulusEpoch.active_devices**: Similarly, stimulus epoch device names must match instrument or procedure device names.
+- **Connections**: Any `source_device` or `target_device` in acquisition connections must reference devices defined in the instrument or procedures.
+- **instrument_id**: The `acquisition.instrument_id` must match `instrument.instrument_id`.
+
+Validation of this relationship occurs during the [GatherMetadataJob](upload_data.md#gathermetadatajob) when metadata is assembled for upload. See [Validation during upload](upload_data.md#validation-during-upload) for when validation runs, what happens when it fails, and how to fix issues.
 
 #### Multiple independent rigs
 
diff --git a/docs/source/acquire_upload/upload_data.md b/docs/source/acquire_upload/upload_data.md
@@ -19,6 +19,29 @@ The main settings you should be concerned with are:
 
 The settings for the GatherMetadataJob are typically set [inside of your upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py#L45-L50) or as part of the `job_type`.
 
+### Validation during upload
+
+The GatherMetadataJob validates the relationship between acquisition and instrument metadata when it assembles the full metadata object. This includes checking that:
+
+- All `active_devices` in acquisition data streams and stimulus epochs exist in the instrument (or procedures, for implanted devices)
+- All devices referenced in acquisition connections exist in the instrument or procedures
+- The `acquisition.instrument_id` matches the `instrument.instrument_id`
+
+**When validation runs**: Validation occurs during the metadata gathering step of the upload job. This runs as part of the aind-data-transfer-service workflow, typically when data is being prepared for transfer (whether from rig to VAST or VAST to S3, depending on your setup).
+
+**If validation fails**:
+
+- With `raise_if_invalid` enabled (strongly recommended): The GatherMetadataJob raises an exception. The upload job fails and no data is transferred. You will see the validation error in the job logs.
+- With `raise_if_invalid` disabled: The job may continue and create metadata with a validation bypass, but errors are logged. This can result in a data asset with invalid metadata that may cause problems downstream.
+
+**How to fix validation failures**:
+
+1. **Active devices not found**: Ensure every device name in `acquisition.json` (in `data_streams[].active_devices` and `stimulus_epochs[].active_devices`) exactly matches a device `name` in `instrument.json`. Device names are case-sensitive. If you use implanted devices, those must be defined in `procedures.json`.
+2. **instrument_id mismatch**: Set `acquisition.instrument_id` to match `instrument.instrument_id`. When merging multiple instruments, the acquisition should reference the merged instrument_id format (see [Merge rules](#merge-rules)).
+3. **Connection device not found**: Ensure `source_device` and `target_device` in each connection match device names in the instrument or procedures.
+
+You can test validation locally before upload using the `InstrumentAcquisitionCompatibility` class from `aind-data-schema`; see the [aind-data-schema validation docs](https://aind-data-schema.readthedocs.io/en/latest/validation.html) for details.
+
 ### Merge rules
 
 ### When can multiple files be merged?
@@ -33,7 +56,7 @@ Each file must follow the naming pattern `<metadata_type>*.json` where `*` is an
 
 #### Contraints
 
-1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the string combination of the IDs of the unique instruments, 
+1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the individual IDs joined with `_` in alphabetical order. Because `acquisition.instrument_id` must match the merged instrument, you must anticipate this format when generating acquisition metadata for multi-instrument sessions. For example, if you acquire across behavior instrument "FRG.10-A" and fiber photometry instrument "FIP-2", the merged instrument_id will be `FIP-2_FRG.10-A` (alphabetically sorted). Your acquisition files must use that value for `instrument_id`.
 
 2. **No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones.