You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/acquire_upload/processing.md
+35-5Lines changed: 35 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,16 +6,46 @@ Scientific computing is currently re-organizing pipelines to be per-modality, ra
6
6
7
7
### Data
8
8
9
-
[todo]
9
+
See [Data organization/Derived data conventions](../philosophy/data_organization.md#derived-data-conventions) for file organization conventions in derived assets.
10
10
11
11
### Metadata
12
12
13
-
All processing pipelines that create derived assets will upgrade the [data_description](https://aind-data-schema.readthedocs.io/en/latest/data_description.html) to a derived data description (changing the name and data_level). Processing also creates additional processing metadata as well as quality_control metadata.
13
+
#### data_description.json
14
14
15
-
Any files that are not modified should simply be copied to the derived asset unchanged.
16
-
How to upgrade a data_description
15
+
All processing pipelines that create derived assets should upgrade the [data_description](https://aind-data-schema.readthedocs.io/en/latest/data_description.html) to a derived data description (changing the name and data_level).
17
16
18
-
Use the [`DataDescription.from_data_description()`](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/e172cb06a63b722eaeaaf8933d0a17cbedf3feea/src/aind_data_schema/core/data_description.py#L334) function to create derived data_description objects. Pass the process name as a parameter. If more source data assets were used than just the one being passed into the function then pass the optional source_data parameter as well with the names of those data assets.
17
+
##### How to upgrade a data_description
18
+
19
+
Use the [`DataDescription.from_data_description()`](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/e172cb06a63b722eaeaaf8933d0a17cbedf3feea/src/aind_data_schema/core/data_description.py#L334) function to create derived data_description objects. Pass the process name as a parameter, often just `"processed"`. If more source data assets were used than just the one being passed into the function then pass the optional `source_data` parameter as well with the names of those data assets.
20
+
21
+
```python
22
+
from pathlib import Path
23
+
from aind_data_schema.core.data_description import DataDescription
Processing pipelines need to track each [DataProcess](https://aind-data-schema.readthedocs.io/en/latest/processing.html#dataprocess) that were run to create the derived data asset.
43
+
44
+
If processing was performed as part of a nextflow pipeline, that should be tracked in the `Processing.pipelines` field using a [Code](https://aind-data-schema.readthedocs.io/en/latest/components/identifiers.html#code) object pointing to the github repository with the nextflow configuration. Use the `DataProcess.pipeline_name` field to indicate that processes were run as part of a pipeline.
45
+
46
+
#### Other metadata
47
+
48
+
Metadata `.json` files that are not modified should be copied to the derived asset unchanged.
0 commit comments