Skip to content

Commit 4f66432

Browse files
committed
updated Metagenomiocs Workflow links
- updated Metagenomics READMEs - added Metagenomics workflow submodule for low biomass pipelines - updated low biomass pipeline docs table-of-contents
1 parent 65ae99c commit 4f66432

6 files changed

Lines changed: 101 additions & 76 deletions

File tree

.gitmodules

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
[submodule "Amplicon/Illumina/Workflow_Documentation/NF_AmpIllumina"]
22
path = Amplicon/Illumina/Workflow_Documentation/NF_AmpIllumina
33
url = https://github.com/nasa/GeneLab_AmpliconSeq_Workflow
4+
[submodule "NF_MGIllumina"]
5+
path = Metagenomics/Low_Biomass/Workflow_Documentation/NF_MGIllumina
6+
url = https://github.com/nasa/GeneLab_Metagenomics_Workflow
7+
branch = DEV

Metagenomics/Low_Biomass/Nanopore/GL-DPPD-7116.md renamed to Metagenomics/Low_Biomass/Pipeline_GL-DPPD-7116_Versions/GL-DPPD-7116.md

Lines changed: 31 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Bioinformatics pipeline for Low biomass long-read metagenomics data
1+
# Bioinformatics pipeline for Low biomass long-read metagenomics data <!-- omit in toc -->
22

33
> **This document holds an overview and some example commands of how GeneLab processes low-biomass, long-read metagenomics datasets. Exact processing commands for specific datasets that have been released are provided with their processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).**
44
@@ -20,11 +20,11 @@ Barbara Novak (GeneLab Data Processing Lead)
2020

2121
---
2222

23-
# Table of contents
23+
# Table of contents <!-- omit in toc -->
2424

25-
- [**Software used**](#software-used)
26-
- [**General processing overview with example commands**](#general-processing-overview-with-example-commands)
27-
- [**Pre-processing**](#pre-processing)
25+
- [Software used](#software-used)
26+
- [General processing overview with example commands](#general-processing-overview-with-example-commands)
27+
- [Pre-processing](#pre-processing)
2828
- [1. Basecalling](#1-basecalling)
2929
- [2. Demultiplexing](#2-demultiplexing)
3030
- [2a. Split Fastq](#2a-split-fastq)
@@ -34,7 +34,7 @@ Barbara Novak (GeneLab Data Processing Lead)
3434
- [3b. Compile Raw Data QC](#3b-compile-raw-data-qc)
3535
- [4. Quality Filtering](#4-quality-filtering)
3636
- [4a. Filter Raw Data](#4a-filter-raw-data)
37-
- [4a. Filtered Data QC](#4b-filtered-data-qc)
37+
- [4b. Filtered Data QC](#4b-filtered-data-qc)
3838
- [4c. Compile Filtered Data QC](#4c-compile-filtered-data-qc)
3939
- [5. Trimming](#5-trimming)
4040
- [5a. Trim Filtered Data](#5a-trim-filtered-data)
@@ -57,10 +57,10 @@ Barbara Novak (GeneLab Data Processing Lead)
5757
- [8b. Remove Host Reads](#8b-remove-host-reads)
5858
- [8c. Compile Host Read Removal QC](#8c-compile-host-read-removal-qc)
5959
- [9. R Environment Setup](#9-r-environment-setup)
60-
- [9a. Load Libraries](#9a-load-libraries)
60+
- [9a. Load libraries](#9a-load-libraries)
6161
- [9b. Define Custom Functions](#9b-define-custom-functions)
6262
- [9c. Set global variables](#9c-set-global-variables)
63-
- [**Read-based processing**](#read-based-processing)
63+
- [Read-based Processing](#read-based-processing)
6464
- [10. Taxonomic Profiling Using Kaiju](#10-taxonomic-profiling-using-kaiju)
6565
- [10a. Build Kaiju Database](#10a-build-kaiju-database)
6666
- [10b. Kaiju Taxonomic Classification](#10b-kaiju-taxonomic-classification)
@@ -82,7 +82,7 @@ Barbara Novak (GeneLab Data Processing Lead)
8282
- [11f. Filter Kraken2 Species Count Table](#11f-filter-kraken2-species-count-table)
8383
- [11g. Kraken2 Taxonomy Barplots](#11g-kraken2-taxonomy-barplots)
8484
- [11h. Kraken2 Feature Decontamination](#11h-kraken2-feature-decontamination)
85-
- [**Assembly-based processing**](#assembly-based-processing)
85+
- [Assembly-based Processing](#assembly-based-processing)
8686
- [12. Sample Assembly](#12-sample-assembly)
8787
- [13. Polish Assembly](#13-polish-assembly)
8888
- [14. Rename Contigs and Summarize Assemblies](#14-rename-contigs-and-summarize-assemblies)
@@ -113,7 +113,7 @@ Barbara Novak (GeneLab Data Processing Lead)
113113
- [22. Generate Normalized, Gene- and Contig-level Coverage Summary Tables of KO-annotations and Taxonomy Across Samples](#22-generate-normalized-gene--and-contig-level-coverage-summary-tables-of-ko-annotations-and-taxonomy-across-samples)
114114
- [22a. Generate Gene-level Coverage Summary Tables](#22a-generate-gene-level-coverage-summary-tables)
115115
- [22b. Generate Contig-level Coverage Summary Tables](#22b-generate-contig-level-coverage-summary-tables)
116-
- [23. **M**etagenome-**A**ssembled **G**enome (MAG) recovery](#23-metagenome-assembled-genome-mag-recovery)
116+
- [23. **M**etagenome-**A**ssembled **G**enome (MAG) Recovery](#23-metagenome-assembled-genome-mag-recovery)
117117
- [23a. Bin Contigs](#23a-bin-contigs)
118118
- [23b. Bin Quality Assessment](#23b-bin-quality-assessment)
119119
- [23c. Filter MAGs](#23c-filter-mags)
@@ -141,19 +141,19 @@ Barbara Novak (GeneLab Data Processing Lead)
141141

142142
|Program|Version|Relevant Links|
143143
|:------|:-----:|------:|
144-
|bbduk| 38.86 |[https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/)|
144+
|bbduk| 38.86 |[https://bbmap.org](https://bbmap.org)|
145145
|bit| 1.8.53 |[https://github.com/AstrobioMike/bioinf_tools#bioinformatics-tools-bit](https://github.com/AstrobioMike/bioinf_tools#bioinformatics-tools-bit)|
146146
|CAT| 5.2.3 |[https://github.com/dutilh/CAT#cat-and-bat](https://github.com/dutilh/CAT#cat-and-bat)|
147147
|CheckM| 1.1.3 |[https://github.com/Ecogenomics/CheckM](https://github.com/Ecogenomics/CheckM)|
148148
|Dorado| 1.1.1| [https://github.com/nanoporetech/dorado](https://github.com/nanoporetech/dorado)|
149-
|filtlong| 0.2.1 |[https://github.com/rrwick/Filtlong](https://github.com/rrwick/Filtlong)|
149+
|Filtlong| 0.2.1 |[https://github.com/rrwick/Filtlong](https://github.com/rrwick/Filtlong)|
150150
|Flye| 2.9.5 | [https://github.com/mikolmogorov/Flye](https://github.com/mikolmogorov/Flye) |
151151
|GTDB-Tk| 2.4.0 |[https://github.com/Ecogenomics/GTDBTk](https://github.com/Ecogenomics/GTDBTk)|
152152
|Kaiju| 1.10.1 | [https://bioinformatics-centre.github.io/kaiju/](https://bioinformatics-centre.github.io/kaiju/) |
153153
|KEGG-Decoder| 1.2.2 |[https://github.com/bjtully/BioData/tree/master/KEGGDecoder#kegg-decoder](https://github.com/bjtully/BioData/tree/master/KEGGDecoder#kegg-decoder)
154154
|KOFamScan| 1.3.0 |[https://github.com/takaram/kofam_scan](https://github.com/takaram/kofam_scan)|
155155
|Kraken2| 2.1.6 | [https://github.com/DerrickWood/kraken2](https://github.com/DerrickWood/kraken2) |
156-
|KrakenTools | 1.2 | [https://ccb.jhu.edu/software/krakentools/](https://ccb.jhu.edu/software/krakentools/) |
156+
|KrakenTools| 1.2 | [https://ccb.jhu.edu/software/krakentools/](https://ccb.jhu.edu/software/krakentools/) |
157157
|Krona| 2.8.1 | [https://github.com/marbl/Krona/wiki](https://github.com/marbl/Krona/wiki)|
158158
|MetaBAT| 2.15 |[https://bitbucket.org/berkeleylab/metabat/src/master/](https://bitbucket.org/berkeleylab/metabat/src/master/)|
159159
|Minimap2| 2.28 | [https://github.com/lh3/minimap2](https://github.com/lh3/minimap2) |
@@ -655,7 +655,7 @@ multiqc --zip-data-dir \
655655

656656
> A major issue with low biomass data is the high potential for contamination due to the low amount of DNA extracted from the samples. Because negative control/blank samples should by theory be contaminant free, any sequence detected in the negative control is a potential contaminant. To filter out contaminants found in negative control samples that may have been due to cross contamination in the lab, we use a read mapping approach. First negative/blank control sample reads are assembled then the filtered, trimmed, and human-removed reads from each low-biomass sample are mapped to the assembled contigs from the negative/blank control samples. Reads mapping to the assembled contigs are categorized as contaminants and are therefore filtered out and thus excluded from downstream analyses.
657657
658-
### 7a. Assemble Contaminants
658+
#### 7a. Assemble Contaminants
659659

660660
```bash
661661
flye --meta \
@@ -1020,7 +1020,7 @@ library(tidyverse)
10201020

10211021
#### 9b. Define Custom Functions
10221022

1023-
#### get_last_assignment()
1023+
#### get_last_assignment() <!-- omit in toc -->
10241024
<details>
10251025
<summary>retrieves the last taxonomy assignment from a taxonomy string</summary>
10261026

@@ -1055,7 +1055,7 @@ library(tidyverse)
10551055

10561056
</details>
10571057

1058-
#### mutate_taxonomy()
1058+
#### mutate_taxonomy() <!-- omit in toc -->
10591059
<details>
10601060
<summary>mutate taxonomy column to contain the lowest taxonomy assignment</summary>
10611061

@@ -1089,7 +1089,7 @@ library(tidyverse)
10891089

10901090
</details>
10911091

1092-
#### process_kaiju_table()
1092+
#### process_kaiju_table() <!-- omit in toc -->
10931093
<details>
10941094
<summary>reformat kaiju output table</summary>
10951095

@@ -1138,7 +1138,7 @@ library(tidyverse)
11381138

11391139
</details>
11401140

1141-
#### merge_kraken_reports()
1141+
#### merge_kraken_reports() <!-- omit in toc -->
11421142
<details>
11431143
<summary>merge and process multiple kraken outputs to one species table</summary>
11441144

@@ -1182,7 +1182,7 @@ library(tidyverse)
11821182

11831183
</details>
11841184

1185-
#### get_abundant_features()
1185+
#### get_abundant_features() <!-- omit in toc -->
11861186
<details>
11871187
<summary>Find abundant features based on the sum of feature values</summary>
11881188

@@ -1215,7 +1215,7 @@ library(tidyverse)
12151215

12161216
</details>
12171217

1218-
#### count_to_rel_abundance()
1218+
#### count_to_rel_abundance() <!-- omit in toc -->
12191219
<details>
12201220
<summary>Convert species count matrix to relative abundance matrix</summary>
12211221

@@ -1248,8 +1248,7 @@ library(tidyverse)
12481248

12491249
</details>
12501250

1251-
1252-
#### filter_rare()
1251+
#### filter_rare() <!-- omit in toc -->
12531252
<details>
12541253
<summary>filter out rare and non_microbial taxonomy assignments based on relative abundance</summary>
12551254

@@ -1291,7 +1290,7 @@ library(tidyverse)
12911290

12921291
</details>
12931292

1294-
#### group_low_abund_taxa()
1293+
#### group_low_abund_taxa() <!-- omit in toc -->
12951294
<details>
12961295
<summary>Group rare taxa or return a table with only rare taxa</summary>
12971296

@@ -1351,7 +1350,7 @@ library(tidyverse)
13511350

13521351
</details>
13531352

1354-
#### make_plot()
1353+
#### make_plot() <!-- omit in toc -->
13551354
<details>
13561355
<summary>create bar plot of relative abundance</summary>
13571356

@@ -1396,7 +1395,7 @@ library(tidyverse)
13961395

13971396
</details>
13981397

1399-
#### make_barplot()
1398+
#### make_barplot() <!-- omit in toc -->
14001399
<details>
14011400
<summary>Creates barplots from a feature table file</summary>
14021401

@@ -1476,7 +1475,7 @@ library(tidyverse)
14761475

14771476
</details>
14781477

1479-
#### make_heatmap()
1478+
#### make_heatmap() <!-- omit in toc -->
14801479
<details>
14811480
<summary>Creates heatmaps from a feature table file</summary>
14821481

@@ -1575,7 +1574,7 @@ library(tidyverse)
15751574

15761575
</details>
15771576

1578-
#### run_decontam()
1577+
#### run_decontam() <!-- omit in toc -->
15791578
<details>
15801579
<summary>Feature table decontamination with decontam</summary>
15811580

@@ -1648,7 +1647,7 @@ library(tidyverse)
16481647

16491648
</details>
16501649

1651-
#### feature_decontam()
1650+
#### feature_decontam() <!-- omit in toc -->
16521651
<details>
16531652
<summary>decontaminate a feature table using the Decontam R package to statistically identify contaminating features in a feature table</summary>
16541653

@@ -1741,7 +1740,7 @@ library(tidyverse)
17411740

17421741
</details>
17431742

1744-
#### process_taxonomy()
1743+
#### process_taxonomy() <!-- omit in toc -->
17451744
<details>
17461745
<summary>process a taxonomy assignment table</summary>
17471746

@@ -1778,7 +1777,7 @@ library(tidyverse)
17781777

17791778
</details>
17801779

1781-
#### fix_names()
1780+
#### fix_names() <!-- omit in toc -->
17821781
<details>
17831782
<summary>clean taxonomy names</summary>
17841783

@@ -1811,7 +1810,7 @@ library(tidyverse)
18111810

18121811
</details>
18131812

1814-
#### read_taxonomy_table()
1813+
#### read_taxonomy_table() <!-- omit in toc -->
18151814
<details>
18161815
<summary>Read Assembly-based coverage annotation table</summary>
18171816

@@ -1852,7 +1851,7 @@ library(tidyverse)
18521851

18531852
</details>
18541853

1855-
#### get_samples()
1854+
#### get_samples() <!-- omit in toc -->
18561855
<details>
18571856
<summary>retrieve sample names for which assemblies were generated</summary>
18581857

0 commit comments

Comments
 (0)