Skip to content

Commit a3f9db2

Browse files
authored
docs: adding information about how uploads work (#35)
* docs: adding information about how uploads work * docs: requests from Helen
1 parent 5082a20 commit a3f9db2

1 file changed

Lines changed: 15 additions & 2 deletions

File tree

docs/source/acquire_upload/upload.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,23 @@
11
# Upload
22

3-
[todo]
3+
Uploading data is done by using the [aind-data-transfer-service](http://aind-data-transfer-service/) ([docs](https://aind-data-transfer-service.readthedocs.io/en/latest/index.html)) which handles running containerized tasks for data copying, compression, metadata gathering, and final upload to S3 and Code Ocean.
4+
5+
## Job types and upload scripts
6+
7+
In general, most users should interact with the transfer service by requesting data upload via [watchdog](https://github.com/AllenNeuralDynamics/aind-watchdog-service) (contact SIPE for setup) or through the aind-data-transfer-service using the REST API and upload scripts. Users control what tasks are run on their data through job types and that parameters that they include in their upload scripts.
8+
9+
For example, this [upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py) demonstrates how to setup the upload parameters for a standard ecephys data asset using the `"default"` job_type. You can view [all available job_type options](https://aind-data-transfer-service.corp.alleninstitute.org/job_params). Please reach out to the Data & Infrastructure team in Scientific Computing to develop custom job types for your data assets.
410

511
## GatherMetadataJob
612

7-
[todo]
13+
The [GatherMetadataJob](https://github.com/AllenNeuralDynamics/aind-metadata-mapper/tree/release-v1.0.0#usage) is the primary tool used to assemble and validate metadata during upload of data assets. The job handles construction of the `data_description`, `subject`, and `procedures` as well as merging and validating `instrument` and `acquisition` metadata. It also runs a full validation step on all available metadata files to ensure cross-compatibility.
14+
15+
The main settings you should be concerned with are:
16+
17+
- `instrument_settings.instrument_id`: this field triggers the job to pull an `instrument.json` file from the metadata-service (where you previously uploaded it).
18+
- `data_description_settings.tags/group/restrictions/data_summary`: each of these fields is meta-metadata about your project and should be accurately filled out, if possible. Please see the [DataDescription](https://aind-data-schema.readthedocs.io/en/latest/data_description.html#datadescription) documentation for details about each field.
19+
20+
The settings for the GatherMetadataJob are typically set [inside of your upload script](https://github.com/AllenNeuralDynamics/aind-data-transfer-service/blob/d1f84020862c3de340020b6cb45bef0fd5105515/docs/examples/aind_data_schema_v2.py#L45-L50) or as part of the `job_type`.
821

922
### Merge rules
1023

0 commit comments

Comments
 (0)