You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Metagenomics/Low_Biomass/Pipeline_GL-DPPD-7116_Versions/GL-DPPD-7116.md
+31-32Lines changed: 31 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# Bioinformatics pipeline for Low biomass long-read metagenomics data
1
+
# Bioinformatics pipeline for Low biomass long-read metagenomics data<!-- omit in toc -->
2
2
3
3
> **This document holds an overview and some example commands of how GeneLab processes low-biomass, long-read metagenomics datasets. Exact processing commands for specific datasets that have been released are provided with their processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).**
4
4
@@ -20,11 +20,11 @@ Barbara Novak (GeneLab Data Processing Lead)
20
20
21
21
---
22
22
23
-
# Table of contents
23
+
# Table of contents<!-- omit in toc -->
24
24
25
-
-[**Software used**](#software-used)
26
-
-[**General processing overview with example commands**](#general-processing-overview-with-example-commands)
27
-
-[**Pre-processing**](#pre-processing)
25
+
-[Software used](#software-used)
26
+
-[General processing overview with example commands](#general-processing-overview-with-example-commands)
27
+
-[Pre-processing](#pre-processing)
28
28
-[1. Basecalling](#1-basecalling)
29
29
-[2. Demultiplexing](#2-demultiplexing)
30
30
-[2a. Split Fastq](#2a-split-fastq)
@@ -34,7 +34,7 @@ Barbara Novak (GeneLab Data Processing Lead)
34
34
-[3b. Compile Raw Data QC](#3b-compile-raw-data-qc)
35
35
-[4. Quality Filtering](#4-quality-filtering)
36
36
-[4a. Filter Raw Data](#4a-filter-raw-data)
37
-
-[4a. Filtered Data QC](#4b-filtered-data-qc)
37
+
-[4b. Filtered Data QC](#4b-filtered-data-qc)
38
38
-[4c. Compile Filtered Data QC](#4c-compile-filtered-data-qc)
39
39
-[5. Trimming](#5-trimming)
40
40
-[5a. Trim Filtered Data](#5a-trim-filtered-data)
@@ -57,10 +57,10 @@ Barbara Novak (GeneLab Data Processing Lead)
-[14. Rename Contigs and Summarize Assemblies](#14-rename-contigs-and-summarize-assemblies)
@@ -113,7 +113,7 @@ Barbara Novak (GeneLab Data Processing Lead)
113
113
-[22. Generate Normalized, Gene- and Contig-level Coverage Summary Tables of KO-annotations and Taxonomy Across Samples](#22-generate-normalized-gene--and-contig-level-coverage-summary-tables-of-ko-annotations-and-taxonomy-across-samples)
> A major issue with low biomass data is the high potential for contamination due to the low amount of DNA extracted from the samples. Because negative control/blank samples should by theory be contaminant free, any sequence detected in the negative control is a potential contaminant. To filter out contaminants found in negative control samples that may have been due to cross contamination in the lab, we use a read mapping approach. First negative/blank control sample reads are assembled then the filtered, trimmed, and human-removed reads from each low-biomass sample are mapped to the assembled contigs from the negative/blank control samples. Reads mapping to the assembled contigs are categorized as contaminants and are therefore filtered out and thus excluded from downstream analyses.
657
657
658
-
### 7a. Assemble Contaminants
658
+
####7a. Assemble Contaminants
659
659
660
660
```bash
661
661
flye --meta \
@@ -1020,7 +1020,7 @@ library(tidyverse)
1020
1020
1021
1021
#### 9b. Define Custom Functions
1022
1022
1023
-
#### get_last_assignment()
1023
+
#### get_last_assignment()<!-- omit in toc -->
1024
1024
<details>
1025
1025
<summary>retrieves the last taxonomy assignment from a taxonomy string</summary>
1026
1026
@@ -1055,7 +1055,7 @@ library(tidyverse)
1055
1055
1056
1056
</details>
1057
1057
1058
-
#### mutate_taxonomy()
1058
+
#### mutate_taxonomy()<!-- omit in toc -->
1059
1059
<details>
1060
1060
<summary>mutate taxonomy column to contain the lowest taxonomy assignment</summary>
1061
1061
@@ -1089,7 +1089,7 @@ library(tidyverse)
1089
1089
1090
1090
</details>
1091
1091
1092
-
#### process_kaiju_table()
1092
+
#### process_kaiju_table()<!-- omit in toc -->
1093
1093
<details>
1094
1094
<summary>reformat kaiju output table</summary>
1095
1095
@@ -1138,7 +1138,7 @@ library(tidyverse)
1138
1138
1139
1139
</details>
1140
1140
1141
-
#### merge_kraken_reports()
1141
+
#### merge_kraken_reports()<!-- omit in toc -->
1142
1142
<details>
1143
1143
<summary>merge and process multiple kraken outputs to one species table</summary>
1144
1144
@@ -1182,7 +1182,7 @@ library(tidyverse)
1182
1182
1183
1183
</details>
1184
1184
1185
-
#### get_abundant_features()
1185
+
#### get_abundant_features()<!-- omit in toc -->
1186
1186
<details>
1187
1187
<summary>Find abundant features based on the sum of feature values</summary>
1188
1188
@@ -1215,7 +1215,7 @@ library(tidyverse)
1215
1215
1216
1216
</details>
1217
1217
1218
-
#### count_to_rel_abundance()
1218
+
#### count_to_rel_abundance()<!-- omit in toc -->
1219
1219
<details>
1220
1220
<summary>Convert species count matrix to relative abundance matrix</summary>
1221
1221
@@ -1248,8 +1248,7 @@ library(tidyverse)
1248
1248
1249
1249
</details>
1250
1250
1251
-
1252
-
#### filter_rare()
1251
+
#### filter_rare() <!-- omit in toc -->
1253
1252
<details>
1254
1253
<summary>filter out rare and non_microbial taxonomy assignments based on relative abundance</summary>
1255
1254
@@ -1291,7 +1290,7 @@ library(tidyverse)
1291
1290
1292
1291
</details>
1293
1292
1294
-
#### group_low_abund_taxa()
1293
+
#### group_low_abund_taxa()<!-- omit in toc -->
1295
1294
<details>
1296
1295
<summary>Group rare taxa or return a table with only rare taxa</summary>
1297
1296
@@ -1351,7 +1350,7 @@ library(tidyverse)
1351
1350
1352
1351
</details>
1353
1352
1354
-
#### make_plot()
1353
+
#### make_plot()<!-- omit in toc -->
1355
1354
<details>
1356
1355
<summary>create bar plot of relative abundance</summary>
1357
1356
@@ -1396,7 +1395,7 @@ library(tidyverse)
1396
1395
1397
1396
</details>
1398
1397
1399
-
#### make_barplot()
1398
+
#### make_barplot()<!-- omit in toc -->
1400
1399
<details>
1401
1400
<summary>Creates barplots from a feature table file</summary>
1402
1401
@@ -1476,7 +1475,7 @@ library(tidyverse)
1476
1475
1477
1476
</details>
1478
1477
1479
-
#### make_heatmap()
1478
+
#### make_heatmap()<!-- omit in toc -->
1480
1479
<details>
1481
1480
<summary>Creates heatmaps from a feature table file</summary>
1482
1481
@@ -1575,7 +1574,7 @@ library(tidyverse)
1575
1574
1576
1575
</details>
1577
1576
1578
-
#### run_decontam()
1577
+
#### run_decontam()<!-- omit in toc -->
1579
1578
<details>
1580
1579
<summary>Feature table decontamination with decontam</summary>
1581
1580
@@ -1648,7 +1647,7 @@ library(tidyverse)
1648
1647
1649
1648
</details>
1650
1649
1651
-
#### feature_decontam()
1650
+
#### feature_decontam()<!-- omit in toc -->
1652
1651
<details>
1653
1652
<summary>decontaminate a feature table using the Decontam R package to statistically identify contaminating features in a feature table</summary>
1654
1653
@@ -1741,7 +1740,7 @@ library(tidyverse)
1741
1740
1742
1741
</details>
1743
1742
1744
-
#### process_taxonomy()
1743
+
#### process_taxonomy()<!-- omit in toc -->
1745
1744
<details>
1746
1745
<summary>process a taxonomy assignment table</summary>
0 commit comments