From beca517221f8770e181e0630872dff82759ee6cf Mon Sep 17 00:00:00 2001 From: Doug Ollerenshaw Date: Tue, 17 Feb 2026 11:28:34 -0800 Subject: [PATCH 1/3] Fleshed out language surrounding instrument naming --- .../prepare_before_acquisition.md | 21 +++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/docs/source/acquire_upload/prepare_before_acquisition.md b/docs/source/acquire_upload/prepare_before_acquisition.md index c5635e3..239faaa 100644 --- a/docs/source/acquire_upload/prepare_before_acquisition.md +++ b/docs/source/acquire_upload/prepare_before_acquisition.md @@ -299,11 +299,28 @@ Subject metadata is populated by lab animal services (LAS) without your involvem ## Instrument -[Instrument](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) metadata should be prepared in advance of data acquisition. +[Instrument](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) metadata should be prepared in advance of data acquisition. Instrument metadata should describe the full set of devices that are combined into the physical instrument that is used to collect an associated dataset, regardless of whether those devices are active on a given session. This makes it possible to specify an instrument.json as a stable collection of devices (just like a physical instrument in the lab) even if some of those devices are used only in a subset of experimental sessions. The actual list of active devices and their configurable settings should be specified separately in the acquisition.json. + +Examples of physical instruments that should have corresponding instrument.json files are: +* Individual behavioral training boxes +* Behavior recording devices that can be combined transiently with physiology rigs +* Physiology rigs (e.g. ephys, fiber photometry, two-photon imaging, etc) + +In cases where a devices for multiple modalities are combined in a persistent manner (e.g. behavior equipment that is built into a physiology rig), that combination of devices should generally be tracked as a single instrument with a single corresponding instrument.json file. + +When collections of devices are combined transiently (e.g. a physiology system that is combined with a physiology system in a swappable manner), those collections of devices can be described by separate instrument.json files that are combined into a single instrument at the time of upload. See the "Multiple Instruments" section below for details. + ### ID -The `instrument_id` for AIND should be the SIPE ID for an instrument. If an instrument is not tracked by SIPE, any string will be accepted. +The `instrument_id` for AIND should be the SIPE ID for an instrument. SIPE instrument IDs are generally tracked relative to the computer(s) included in that instrument. A json-formatted list of SIPE instruments and associated computers can be found [here](http://mpe-computers/v2.0). + +Examples of instruments tracked by SIPE: +* a foraging behavior box, "FRG.10-A" +* a mesoscope system, "MESO.1" +* a combined ephys/behavior system, "ND_Ephys.1" + +If an instrument is not tracked by SIPE, any string will be accepted. ### Other details From 24bbee03d2ac0fe0e1cda5e207c6dc4c136e019b Mon Sep 17 00:00:00 2001 From: Doug Ollerenshaw Date: Tue, 17 Feb 2026 11:55:10 -0800 Subject: [PATCH 2/3] Added clarification on acqusition/instrument dependencies --- docs/source/acquire_upload/acquire_data.md | 15 ++++++++++++- docs/source/acquire_upload/upload_data.md | 25 +++++++++++++++++++++- 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/docs/source/acquire_upload/acquire_data.md b/docs/source/acquire_upload/acquire_data.md index 269c4b0..93c87da 100644 --- a/docs/source/acquire_upload/acquire_data.md +++ b/docs/source/acquire_upload/acquire_data.md @@ -24,7 +24,20 @@ Rigs are responsible for generating the [acquisition.json](https://aind-data-sch If you can't generate your aind-data-schema formatted metadata on your rig, you can use what we call the “extractor/mapper” pattern. We refer to the code on the rig that extracts metadata from data files as the extractor. We prefer for you to maintain this code in [aind-metadata-extractor](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/) but you can also maintain it yourself. The code that takes the extractor output and transforms it to aind-data-schema is called the mapper. Scientific computing will help develop the mapper as well as maintain it, you are responsible for your extractor. The key to the extractor/mapper pattern is the data contract that defines the extractor output. The data contract must be a pydantic model or JSON schema file and must live in the [aind_metadata_extractor.models](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/tree/main/src/aind_metadata_extractor/models) module. -On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper. +On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper. + +#### Relationship between acquisition.json and instrument.json + +The acquisition and instrument metadata files are tightly coupled. The [instrument.json](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) describes the full set of devices in your instrument (each device has a `name` field). The [acquisition.json](https://aind-data-schema.readthedocs.io/en/latest/acquisition.html) describes what was active during a specific session. + +**Device name matching requirement**: Every device name listed in `acquisition.json` must exist in either the instrument or procedures metadata: + +- **DataStream.active_devices**: Each data stream lists the devices that were acquiring data. These names must match the `name` field of devices in `instrument.json` (or implanted devices in `procedures.json`). +- **StimulusEpoch.active_devices**: Similarly, stimulus epoch device names must match instrument or procedure device names. +- **Connections**: Any `source_device` or `target_device` in acquisition connections must reference devices defined in the instrument or procedures. +- **instrument_id**: The `acquisition.instrument_id` must match `instrument.instrument_id`. + +Validation of this relationship occurs during the [GatherMetadataJob](upload_data.md#gathermetadatajob) when metadata is assembled for upload. See [Validation during upload](upload_data.md#validation-during-upload) for when validation runs, what happens when it fails, and how to fix issues. #### Multiple independent rigs diff --git a/docs/source/acquire_upload/upload_data.md b/docs/source/acquire_upload/upload_data.md index a1396e3..1ed35e5 100644 --- a/docs/source/acquire_upload/upload_data.md +++ b/docs/source/acquire_upload/upload_data.md @@ -19,6 +19,29 @@ The main settings you should be concerned with are: The settings for the GatherMetadataJob are typically set [inside of your upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py#L45-L50) or as part of the `job_type`. +### Validation during upload + +The GatherMetadataJob validates the relationship between acquisition and instrument metadata when it assembles the full metadata object. This includes checking that: + +- All `active_devices` in acquisition data streams and stimulus epochs exist in the instrument (or procedures, for implanted devices) +- All devices referenced in acquisition connections exist in the instrument or procedures +- The `acquisition.instrument_id` matches the `instrument.instrument_id` + +**When validation runs**: Validation occurs during the metadata gathering step of the upload job. This runs as part of the aind-data-transfer-service workflow, typically when data is being prepared for transfer (whether from rig to VAST or VAST to S3, depending on your setup). + +**If validation fails**: + +- With `raise_if_invalid` enabled (strongly recommended): The GatherMetadataJob raises an exception. The upload job fails and no data is transferred. You will see the validation error in the job logs. +- With `raise_if_invalid` disabled: The job may continue and create metadata with a validation bypass, but errors are logged. This can result in a data asset with invalid metadata that may cause problems downstream. + +**How to fix validation failures**: + +1. **Active devices not found**: Ensure every device name in `acquisition.json` (in `data_streams[].active_devices` and `stimulus_epochs[].active_devices`) exactly matches a device `name` in `instrument.json`. Device names are case-sensitive. If you use implanted devices, those must be defined in `procedures.json`. +2. **instrument_id mismatch**: Set `acquisition.instrument_id` to match `instrument.instrument_id`. When merging multiple instruments, the acquisition should reference the merged instrument_id format (see [Merge rules](#merge-rules)). +3. **Connection device not found**: Ensure `source_device` and `target_device` in each connection match device names in the instrument or procedures. + +You can test validation locally before upload using the `InstrumentAcquisitionCompatibility` class from `aind-data-schema`; see the [aind-data-schema validation docs](https://aind-data-schema.readthedocs.io/en/latest/validation.html) for details. + ### Merge rules ### When can multiple files be merged? @@ -33,7 +56,7 @@ Each file must follow the naming pattern `*.json` where `*` is an #### Contraints -1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the string combination of the IDs of the unique instruments, +1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the individual IDs joined with `_` in alphabetical order. Because `acquisition.instrument_id` must match the merged instrument, you must anticipate this format when generating acquisition metadata for multi-instrument sessions. For example, if you acquire across behavior instrument "FRG.10-A" and fiber photometry instrument "FIP-2", the merged instrument_id will be `FIP-2_FRG.10-A` (alphabetically sorted). Your acquisition files must use that value for `instrument_id`. 2. **No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones. From 17d0792713c50a7e364f95a673a2e33807f121f5 Mon Sep 17 00:00:00 2001 From: Doug Ollerenshaw Date: Tue, 17 Feb 2026 12:10:33 -0800 Subject: [PATCH 3/3] Revert "Added clarification on acqusition/instrument dependencies" This reverts commit 24bbee03d2ac0fe0e1cda5e207c6dc4c136e019b. --- docs/source/acquire_upload/acquire_data.md | 15 +------------ docs/source/acquire_upload/upload_data.md | 25 +--------------------- 2 files changed, 2 insertions(+), 38 deletions(-) diff --git a/docs/source/acquire_upload/acquire_data.md b/docs/source/acquire_upload/acquire_data.md index 93c87da..269c4b0 100644 --- a/docs/source/acquire_upload/acquire_data.md +++ b/docs/source/acquire_upload/acquire_data.md @@ -24,20 +24,7 @@ Rigs are responsible for generating the [acquisition.json](https://aind-data-sch If you can't generate your aind-data-schema formatted metadata on your rig, you can use what we call the “extractor/mapper” pattern. We refer to the code on the rig that extracts metadata from data files as the extractor. We prefer for you to maintain this code in [aind-metadata-extractor](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/) but you can also maintain it yourself. The code that takes the extractor output and transforms it to aind-data-schema is called the mapper. Scientific computing will help develop the mapper as well as maintain it, you are responsible for your extractor. The key to the extractor/mapper pattern is the data contract that defines the extractor output. The data contract must be a pydantic model or JSON schema file and must live in the [aind_metadata_extractor.models](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/tree/main/src/aind_metadata_extractor/models) module. -On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper. - -#### Relationship between acquisition.json and instrument.json - -The acquisition and instrument metadata files are tightly coupled. The [instrument.json](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) describes the full set of devices in your instrument (each device has a `name` field). The [acquisition.json](https://aind-data-schema.readthedocs.io/en/latest/acquisition.html) describes what was active during a specific session. - -**Device name matching requirement**: Every device name listed in `acquisition.json` must exist in either the instrument or procedures metadata: - -- **DataStream.active_devices**: Each data stream lists the devices that were acquiring data. These names must match the `name` field of devices in `instrument.json` (or implanted devices in `procedures.json`). -- **StimulusEpoch.active_devices**: Similarly, stimulus epoch device names must match instrument or procedure device names. -- **Connections**: Any `source_device` or `target_device` in acquisition connections must reference devices defined in the instrument or procedures. -- **instrument_id**: The `acquisition.instrument_id` must match `instrument.instrument_id`. - -Validation of this relationship occurs during the [GatherMetadataJob](upload_data.md#gathermetadatajob) when metadata is assembled for upload. See [Validation during upload](upload_data.md#validation-during-upload) for when validation runs, what happens when it fails, and how to fix issues. +On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper. #### Multiple independent rigs diff --git a/docs/source/acquire_upload/upload_data.md b/docs/source/acquire_upload/upload_data.md index 1ed35e5..a1396e3 100644 --- a/docs/source/acquire_upload/upload_data.md +++ b/docs/source/acquire_upload/upload_data.md @@ -19,29 +19,6 @@ The main settings you should be concerned with are: The settings for the GatherMetadataJob are typically set [inside of your upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py#L45-L50) or as part of the `job_type`. -### Validation during upload - -The GatherMetadataJob validates the relationship between acquisition and instrument metadata when it assembles the full metadata object. This includes checking that: - -- All `active_devices` in acquisition data streams and stimulus epochs exist in the instrument (or procedures, for implanted devices) -- All devices referenced in acquisition connections exist in the instrument or procedures -- The `acquisition.instrument_id` matches the `instrument.instrument_id` - -**When validation runs**: Validation occurs during the metadata gathering step of the upload job. This runs as part of the aind-data-transfer-service workflow, typically when data is being prepared for transfer (whether from rig to VAST or VAST to S3, depending on your setup). - -**If validation fails**: - -- With `raise_if_invalid` enabled (strongly recommended): The GatherMetadataJob raises an exception. The upload job fails and no data is transferred. You will see the validation error in the job logs. -- With `raise_if_invalid` disabled: The job may continue and create metadata with a validation bypass, but errors are logged. This can result in a data asset with invalid metadata that may cause problems downstream. - -**How to fix validation failures**: - -1. **Active devices not found**: Ensure every device name in `acquisition.json` (in `data_streams[].active_devices` and `stimulus_epochs[].active_devices`) exactly matches a device `name` in `instrument.json`. Device names are case-sensitive. If you use implanted devices, those must be defined in `procedures.json`. -2. **instrument_id mismatch**: Set `acquisition.instrument_id` to match `instrument.instrument_id`. When merging multiple instruments, the acquisition should reference the merged instrument_id format (see [Merge rules](#merge-rules)). -3. **Connection device not found**: Ensure `source_device` and `target_device` in each connection match device names in the instrument or procedures. - -You can test validation locally before upload using the `InstrumentAcquisitionCompatibility` class from `aind-data-schema`; see the [aind-data-schema validation docs](https://aind-data-schema.readthedocs.io/en/latest/validation.html) for details. - ### Merge rules ### When can multiple files be merged? @@ -56,7 +33,7 @@ Each file must follow the naming pattern `*.json` where `*` is an #### Contraints -1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the individual IDs joined with `_` in alphabetical order. Because `acquisition.instrument_id` must match the merged instrument, you must anticipate this format when generating acquisition metadata for multi-instrument sessions. For example, if you acquire across behavior instrument "FRG.10-A" and fiber photometry instrument "FIP-2", the merged instrument_id will be `FIP-2_FRG.10-A` (alphabetically sorted). Your acquisition files must use that value for `instrument_id`. +1. **Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the string combination of the IDs of the unique instruments, 2. **No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones.