Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3005d1a
NF_MAAgilent1ch: pass workflow version to qmd
cyouh95 Feb 11, 2025
31a118e
NF_MAAgilent1ch: pass ensembl version to qmd
cyouh95 Feb 11, 2025
ffaa0ae
NF_MAAgilent1ch: use annotation file passed to qmd
cyouh95 Feb 11, 2025
09384c5
NF_MAAgilent1ch: remove visualization_PCA_table_GLmicroarray.csv output
cyouh95 Feb 11, 2025
1641c3c
NF_MAAgilent1ch: update qmd structure
cyouh95 Feb 11, 2025
6a72622
NF_MAAgilent1ch: #104 add --skipDE option
cyouh95 Feb 13, 2025
5a82d67
NF_MAAgilent1ch: support custom annotations
cyouh95 Feb 25, 2025
cbe2515
NF_MAAgilent1ch: update protocol
cyouh95 Feb 25, 2025
5a7d4a8
NF_MAAgilent1ch: update pipeline documentation
cyouh95 Feb 27, 2025
310f1dd
NF_MAAgilent1ch: update images
cyouh95 Feb 27, 2025
3c4dce9
NF_MAAgilent1ch: update pipeline version from GL-DPPD-7112 to GL-DPPD…
cyouh95 Feb 27, 2025
0464af5
NF_MAAgilent1ch: update accepted ISA field name for label
cyouh95 Mar 4, 2025
0ead0df
NF_MAAgilent1ch: update workflow version from 1.0.4 to 1.0.5
cyouh95 Mar 4, 2025
c9b2739
NF_MAAgilent1ch: update CHANGELOG.md
cyouh95 Mar 4, 2025
c86e746
NF_MAAgilent1ch: update custom functions in pipeline doc
cyouh95 Mar 25, 2025
ede2db1
NF_MAAgilent1ch: update nextflow version from 23.10.1 to 24.10.5
cyouh95 Mar 25, 2025
330d014
Refactor Agilent-1-channel Workflow:
jihanyehia Apr 1, 2026
a644612
Update 3rd party software licenses and add missing purrr license
jihanyehia Apr 2, 2026
e11f6e0
Update 3rd party software licenses for glue and stringr
jihanyehia Apr 6, 2026
cae2a97
Add conditional execution of UPDATE_ISA_TABLES that is not expected t…
jihanyehia Apr 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
1,563 changes: 1,563 additions & 0 deletions Microarray/Agilent_1-channel/Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112-A.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion Microarray/Agilent_1-channel/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# GeneLab bioinformatics processing pipeline for Agilent 1-channel microarray data


> **The document [`GL-DPPD-7112.md`](Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112.md) holds an overview and example commands for how GeneLab processes Agilent 1-channel microarray datasets. See the [Repository Links](#repository-links) descriptions below for more information. Processed data output files and processing code is provided for each GLDS dataset along with the processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).**
> **The document [`GL-DPPD-7112-A.md`](Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112-A.md) holds an overview and example commands for how GeneLab processes Agilent 1-channel microarray datasets. See the [Repository Links](#repository-links) descriptions below for more information. Processed data output files and processing code is provided for each GLDS dataset along with the processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).**

---

Expand All @@ -20,6 +20,7 @@

- Contains instructions for installing and running the GeneLab MAAgilent1ch workflow


---
**Developed by:**
Jonathan Oribello
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,22 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.0.5](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MAAgilent1ch_1.0.5/Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch) - 2025-05-xx

### Added

- Support for custom annotations, see [specification](examples/annotations/README.md)
- Add option to skip differential expression analysis (`--skipDE`) ([#104](https://github.com/nasa/GeneLab_Data_Processing/issues/104))
- Add a retry wrapper for functions that utilize internet resources, syncing with [NF_MAAffymetrix_1.0.3](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MAAffymetrix_1.0.3/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix)
- Workflow can now be run using an ISA archive by supplying parameter: 'isaArchivePath' (as either a local path or public web uri), syncing with [NF_MAAffymetrix_1.0.2](https://github.com/asaravia-butler/GeneLab_Data_Processing/tree/NF_MAAffymetrix_1.0.2/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix)

### Changed

- Publish directory behavior reworked to use the OSD accession as part of the default name. Now uses `resultsDir` instead of `outputDir` as the parameter name when a user does control the published files directory. Syncing with [NF_MAAffymetrix_1.0.2](https://github.com/asaravia-butler/GeneLab_Data_Processing/tree/NF_MAAffymetrix_1.0.2/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix)
- Small bug fixes in `Agile1CMP.qmd`
- Update the custom `fetch_organism_specific_annotation_table()` function, used when loading organism-specific annotation metadata, to convert figshare ndownloader URLs to direct API endpoints, as ndownloader URLs require redirect handling that is not supported in all programmatic download contexts
- Simplify group sample retrieval during differential expression group-wise statistics computation to use a more concise `filter/pull/sort` chain instead of `group_by/summarize/filter/pull`, addressing the deprecation warning in dplyr >= 1.1.0 where returning more than 1 row per `summarise()` group is deprecated

## [1.0.4](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MAAgilent1ch_1.0.4/Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch) - 2024-10-02

### Added
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### Implementation Tools <!-- omit in toc -->

The current GeneLab Agilent 1 Channel Microarray consensus processing pipeline (NF_MAAgilent1ch), [GL-DPPD-7112](../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_MAAgilent1ch) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
The current GeneLab Agilent 1 Channel Microarray consensus processing pipeline (NF_MAAgilent1ch), [GL-DPPD-7112-A](../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112-A.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_MAAgilent1ch) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.

### Workflow & Subworkflows <!-- omit in toc -->

Expand All @@ -14,7 +14,7 @@ The current GeneLab Agilent 1 Channel Microarray consensus processing pipeline (

---
The NF_MAAgilent1ch workflow is composed of three subworkflows as shown in the image above.
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7112 pipeline document](../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112.md):
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7112-A pipeline document](../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112-A.md):

1. **Analysis Staging Subworkflow**

Expand All @@ -25,7 +25,7 @@ Below is a description of each subworkflow and the additional output files gener
2. **Agilent 1 Channel Microarray Processing Subworkflow**

- Description:
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using the [GL-DPPD-7112 pipeline](../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112.md).
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using the [GL-DPPD-7112-A pipeline](../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112-A.md).

1. **V&V Pipeline Subworkflow**

Expand Down Expand Up @@ -53,6 +53,7 @@ Below is a description of each subworkflow and the additional output files gener
- [3. Run the Workflow](#3-run-the-workflow)
- [3a. Approach 1: Run the workflow on a GeneLab Agilent 1 Channel Microarray dataset](#3a-approach-1-run-the-workflow-on-a-genelab-agilent-1-channel-microarray-dataset)
- [3b. Approach 2: Run the workflow on a non-GLDS dataset using a user-created runsheet](#3b-approach-2-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
- [3c. Approach 3: Run the workflow using an ISA Archive](#3c-approach-3-run-the-workflow-using-an-isa-archive)
- [4. Additional Output Files](#4-additional-output-files)

<br>
Expand Down Expand Up @@ -93,9 +94,9 @@ We recommend installing Singularity on a system wide level as per the associated
All files required for utilizing the NF_MAAgilent1ch GeneLab workflow for processing Agilent 1 Channel Microarray data are in the [workflow_code](workflow_code) directory. To get a copy of latest NF_MAAgilent1ch version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:

```bash
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_MAAgilent1ch_1.0.4/NF_MAAgilent1ch_1.0.4.zip
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_MAAgilent1ch_1.0.5/NF_MAAgilent1ch_1.0.5.zip

unzip NF_MAAgilent1ch_1.0.4.zip
unzip NF_MAAgilent1ch_1.0.5.zip
```

<br>
Expand All @@ -104,15 +105,15 @@ unzip NF_MAAgilent1ch_1.0.4.zip

### 3. Run the Workflow

While in the location containing the `NF_MAAgilent1ch_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAgilent1ch workflow:
While in the location containing the `NF_MAAgilent1ch_1.0.5` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAgilent1ch workflow:
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.

<br>

#### 3a. Approach 1: Run the workflow on a GeneLab Agilent 1 Channel Microarray dataset

```bash
nextflow run NF_MAAgilent1ch_1.0.4/main.nf \
nextflow run NF_MAAgilent1ch_1.0.5/main.nf \
-profile singularity \
--osdAccession OSD-548 \
--gldsAccession GLDS-548
Expand All @@ -125,16 +126,30 @@ nextflow run NF_MAAgilent1ch_1.0.4/main.nf \
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).

```bash
nextflow run NF_MAAgilent1ch_1.0.4/main.nf \
nextflow run NF_MAAgilent1ch_1.0.5/main.nf \
-profile singularity \
--runsheetPath </path/to/runsheet>
```

<br>

#### 3c. Approach 3: Run the workflow using an ISA Archive

> Note: Specifications for the ISA Tab Archive format can be found [here](https://isa-specs.readthedocs.io/en/latest/isatab.html).

```bash
nextflow run NF_MAAgilent1ch_1.0.5/main.nf \
-profile singularity \
--osdAccession OSD-548 \
--gldsAccession GLDS-548 \
--isaArchivePath </path/to/isaArchive>
```

<br>

**Required Parameters For All Approaches:**

* `NF_MAAgilent1ch_1.0.4/main.nf` - Instructs Nextflow to run the NF_MAAgilent1ch workflow
* `NF_MAAgilent1ch_1.0.5/main.nf` - Instructs Nextflow to run the NF_MAAgilent1ch workflow

* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow

Expand All @@ -155,18 +170,30 @@ nextflow run NF_MAAgilent1ch_1.0.4/main.nf \

<br>

**Additional Required Parameters For [Approach 3](#3c-approach-3-run-the-workflow-using-an-isa-archive):**

* `--osdAccession OSD-###` – specifies the OSD ID to process through the NF_MAAgilent1ch workflow (replace ### with the OSD number)

* `--gldsAccession GLDS-###` – specifies the GLDS ID to process through the NF_MAAgilent1ch workflow (replace ### with the GLDS number)

* `--isaArchivePath` - specifies a local or URL path to an *ISA.zip (Default: *ISA.zip is automatically fetched from the GeneLab Repository for the GLDS dataset being processed)

<br>

**Optional Parameters:**

* `--skipVV` - skip the automated V&V processes (Default: the automated V&V processes are active)

* `--outputDir` - specifies the directory to save the raw and processed data files (Default: files are saved in the launch directory)
* `--skipDE` - skip the differential expression analysis (Default: the differential expression analysis is performed)

* `--resultsDir` - specifies the output directory for all files produced by the workflow (Default: <OSD-NNN_GLDS-NNN> if OSD and GLDS accessions are specified. Otherwise, the workflow launch directory.)

<br>

All parameters listed above and additional optional arguments for the NF_MAAgilent1ch workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:

```bash
nextflow run NF_MAAgilent1ch_1.0.4/main.nf --help
nextflow run NF_MAAgilent1ch_1.0.5/main.nf --help
```

See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
Expand All @@ -180,13 +207,14 @@ See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nex
All R code steps and output are rendered within a Quarto document yielding the following:

- Output:
- NF_MAAgilent1ch_1.0.4.html (html report containing executed code and output including QA plots)
- NF_MAAgilent1ch_1.0.5.html (html report containing executed code and output including QA plots)


The outputs from the Analysis Staging and V&V Pipeline Subworkflows are described below:
> Note: The outputs from the Agilent 1 Channel Microarray Processing Subworkflow are documented in the [GL-DPPD-7112.md](../../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112.md) processing protocol.
> Note: The outputs from the Agilent 1 Channel Microarray Processing Subworkflow are documented in the [GL-DPPD-7112-A.md](../../../Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112-A.md) processing protocol.

**Analysis Staging Subworkflow**
> Note: only applicable for [Approach 1](#3a-approach-1-run-the-workflow-on-a-genelab-agilent-1-channel-microarray-dataset) and [Approach 3](#3c-approach-3-run-the-workflow-using-an-isa-archive)

- Output:
- \*_microarray_v1_runsheet.csv (table containing metadata required for processing, including the raw reads files location)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Custom Annotations Specification

## Description

* If using custom gene annotations when processing Agilent 1-channel datasets through GeneLab's Agilent 1-channel processing pipeline, a csv config file must be provided as specified below.
* See [config.csv](config.csv) for the latest config file used at GeneLab.


## Example

- [config.csv](config.csv)


## Required columns

| Column Name | Type | Description | Example |
|:------------|:-----|:------------|:--------|
| array_design | string | A bioMart attribute identifier denoting the microarray probe/probeset attribute used for annotation mapping. | AGILENT SurePrint G3 GE 8x60k v3 |
| annot_type | string | Used to determine how the custom annotations are parsed before merging to the data. Currently, only the below are supported: <ul><li>`agilent`: Annotations file is expected to be in the AA (All Annotations) format by [Agilent](https://earray.chem.agilent.com/earray/)</li><li>`custom`: Annotations file is merged as is, expected to have the following columns: `ProbesetID`, `ENTREZID`, `SYMBOL`, `GENENAME`, `ENSEMBL`, `REFSEQ`, `GOSLIM_IDS`, `STRING_id`, `count_gene_mappings`, `gene_mapping_source`</li></ul> | agilent |
| annot_filename | string | Name of the custom annotations file. This is the AllAnnotations file downloaded from Agilent's eArray web portal. | 072363_D_AA_20240521.txt |

## Optional columns
If the file was downloaded from a website, provide the download link used and date
downloaded in additional columns after the required column for traceability.

| Column Name | Type | Description | Example |
|:------------|:-----|:------------|:--------|
| download_link | string | The URL used to retrieve the annotation file. | https://earray.chem.agilent.com/earray/array/displayViewArrayDesign.do?eArrayAction=view&arraydesignid=ADID40392 |
| download_date | date string | The date the file was retrieved in YYYY-MM-DD format. | 2024-11-15 |
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
array_design,annot_type,annot_filename,download_link,download_date
AGILENT SurePrint G3 GE 8x60k v3,agilent,072363_D_AA_20240521.txt,https://earray.chem.agilent.com/earray/array/displayViewArrayDesign.do?eArrayAction=view&arraydesignid=ADID40392,2024-11-15
Loading