You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/acquire_upload/acquire_data.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,20 @@ Rigs are responsible for generating the [acquisition.json](https://aind-data-sch
24
24
25
25
If you can't generate your aind-data-schema formatted metadata on your rig, you can use what we call the “extractor/mapper” pattern. We refer to the code on the rig that extracts metadata from data files as the extractor. We prefer for you to maintain this code in [aind-metadata-extractor](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/) but you can also maintain it yourself. The code that takes the extractor output and transforms it to aind-data-schema is called the mapper. Scientific computing will help develop the mapper as well as maintain it, you are responsible for your extractor. The key to the extractor/mapper pattern is the data contract that defines the extractor output. The data contract must be a pydantic model or JSON schema file and must live in the [aind_metadata_extractor.models](https://github.com/AllenNeuralDynamics/aind-metadata-extractor/tree/main/src/aind_metadata_extractor/models) module.
26
26
27
-
On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper.
27
+
On your rig you should output files that match the name of the corresponding mapper that will be run. So if your mapper is called fip you should write a `fip.json` file that validates against the fip extractor schema. The [GatherMetadataJob](upload_data.md#gathermetadatajob) will automatically run your mapper.
28
+
29
+
#### Relationship between acquisition.json and instrument.json
30
+
31
+
The acquisition and instrument metadata files are tightly coupled. The [instrument.json](https://aind-data-schema.readthedocs.io/en/latest/instrument.html) describes the full set of devices in your instrument (each device has a `name` field). The [acquisition.json](https://aind-data-schema.readthedocs.io/en/latest/acquisition.html) describes what was active during a specific session.
32
+
33
+
**Device name matching requirement**: Every device name listed in `acquisition.json` must exist in either the instrument or procedures metadata:
34
+
35
+
-**DataStream.active_devices**: Each data stream lists the devices that were acquiring data. These names must match the `name` field of devices in `instrument.json` (or implanted devices in `procedures.json`).
36
+
-**StimulusEpoch.active_devices**: Similarly, stimulus epoch device names must match instrument or procedure device names.
37
+
-**Connections**: Any `source_device` or `target_device` in acquisition connections must reference devices defined in the instrument or procedures.
38
+
-**instrument_id**: The `acquisition.instrument_id` must match `instrument.instrument_id`.
39
+
40
+
Validation of this relationship occurs during the [GatherMetadataJob](upload_data.md#gathermetadatajob) when metadata is assembled for upload. See [Validation during upload](upload_data.md#validation-during-upload) for when validation runs, what happens when it fails, and how to fix issues.
Copy file name to clipboardExpand all lines: docs/source/acquire_upload/upload_data.md
+24-1Lines changed: 24 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,29 @@ The main settings you should be concerned with are:
19
19
20
20
The settings for the GatherMetadataJob are typically set [inside of your upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py#L45-L50) or as part of the `job_type`.
21
21
22
+
### Validation during upload
23
+
24
+
The GatherMetadataJob validates the relationship between acquisition and instrument metadata when it assembles the full metadata object. This includes checking that:
25
+
26
+
- All `active_devices` in acquisition data streams and stimulus epochs exist in the instrument (or procedures, for implanted devices)
27
+
- All devices referenced in acquisition connections exist in the instrument or procedures
28
+
- The `acquisition.instrument_id` matches the `instrument.instrument_id`
29
+
30
+
**When validation runs**: Validation occurs during the metadata gathering step of the upload job. This runs as part of the aind-data-transfer-service workflow, typically when data is being prepared for transfer (whether from rig to VAST or VAST to S3, depending on your setup).
31
+
32
+
**If validation fails**:
33
+
34
+
- With `raise_if_invalid` enabled (strongly recommended): The GatherMetadataJob raises an exception. The upload job fails and no data is transferred. You will see the validation error in the job logs.
35
+
- With `raise_if_invalid` disabled: The job may continue and create metadata with a validation bypass, but errors are logged. This can result in a data asset with invalid metadata that may cause problems downstream.
36
+
37
+
**How to fix validation failures**:
38
+
39
+
1.**Active devices not found**: Ensure every device name in `acquisition.json` (in `data_streams[].active_devices` and `stimulus_epochs[].active_devices`) exactly matches a device `name` in `instrument.json`. Device names are case-sensitive. If you use implanted devices, those must be defined in `procedures.json`.
40
+
2.**instrument_id mismatch**: Set `acquisition.instrument_id` to match `instrument.instrument_id`. When merging multiple instruments, the acquisition should reference the merged instrument_id format (see [Merge rules](#merge-rules)).
41
+
3.**Connection device not found**: Ensure `source_device` and `target_device` in each connection match device names in the instrument or procedures.
42
+
43
+
You can test validation locally before upload using the `InstrumentAcquisitionCompatibility` class from `aind-data-schema`; see the [aind-data-schema validation docs](https://aind-data-schema.readthedocs.io/en/latest/validation.html) for details.
44
+
22
45
### Merge rules
23
46
24
47
### When can multiple files be merged?
@@ -33,7 +56,7 @@ Each file must follow the naming pattern `<metadata_type>*.json` where `*` is an
33
56
34
57
#### Contraints
35
58
36
-
1.**Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the string combination of the IDs of the unique instruments,
59
+
1.**Unique fields must match**: Certain identifier fields that should be unique across the dataset (like `subject_id`) **must have identical values** in all files being merged. If these fields conflict, the merge will fail and your upload job will be rejected. An important exception is the `instrument_id` field. If two or more instrument JSON files are joined, the merged instrument JSON file will have an `instrument_id` that is the individual IDs joined with `_` in alphabetical order. Because `acquisition.instrument_id` must match the merged instrument, you must anticipate this format when generating acquisition metadata for multi-instrument sessions. For example, if you acquire across behavior instrument "FRG.10-A" and fiber photometry instrument "FIP-2", the merged instrument_id will be `FIP-2_FRG.10-A` (alphabetically sorted). Your acquisition files must use that value for `instrument_id`.
37
60
38
61
2.**No shared devices, with the exception of a single shared clock**: In general, two instruments can be merged **if and only if there are no shared devices** between them. Devices are identified by their `name` field. If the same device name appears in both instrument files, they should really be defined as a single instrument, not two separate ones.
0 commit comments