This repository contains the code associated with the manuscript A Multiple Sclerosis Protective Coding Variant Reveals an Essential Role for HDAC7 in Regulatory T cells. This README file describes the organisation of the scripts related to the analyses presented in the manuscript. This repository has also been deposited on figshare.com (doi:10.6084/m9.figshare.21539098), along with various intermediate results file that might be useful to users who wish to reuse the data and analyses in this project. The github repository only contains the scripts, but the figshare.com repository contains the intermediate results file as well. For ease of use, the full directory and file tree structure deposited on figshare.com is presented at the end of this document.
- Throughout this project, We use the
SLURMjob scheduling system to run some of the "heavy" analyses on the Yale University Ruddle cluster. - The code and additional resources on figshare.com are provided as-is. We have validated that this runs without errors on the Ruddle cluster and on local MacOS systems, but do not guarantee that this will be the case in different computing environments.
- The "light" analysis portion of this code is written with relative paths that have root project directory structure within this repository (detailed below) but the paths in the scripts run on Ruddle will need to be adjusted to meet you user names and path structures.
- All analyses run in R were run with
R3.5(unless stated otherwise in the script header). We do not guarantee the scripts will run with more recent versions of R
The repository is split into 2 main parts, each of which serve as project root for the scripts contained in them.
- human_datasets_analyses: contains all human transcriptmic data analysed, generated in this manuscript or reanalyses of publicly available datasets.
- msHDAC7KI_analyses: contains all analyses related to the in vivo EAE model using HDAC7 R150KI mice.
This directory is the project root for all scripts in the subdirectories.
bulk.rna.align.sh: alignment and quantitations from the original Fastq files. This script is designed to be run on the Ruddle cluster with theSLURMscheduler distributing one job per sample. To do so, the scripts iterates through a sample table specifying the name of the fastq files associated with each sample. The sample table is expected to contain at least the following columns: Sample_Number, Sample_Name, RNA_FastQ1, RNA_FastQ2.bulk_wrappers.Randgseaplot2_PPAmod.Rcontain custom functions for differential gene expression analyses in R. They are sourced by the script in the analysis folder.
Co-expression analyses on human Treg cells, highlighting the main module containing HDAC7, related to Figure 1. Using this module, we run enrichment analyses against public databases using enrichr, and against a previously published single cell RNAseq dataset of human Treg cells in skin and blood.
Analyses related to the knock-down of HDAC7 in human Treg cells presented in Figure 2 and supplementary figure 2. We run differential expression analyses with DESeq2 and then run GSEA on custom FOXO geneset generated by reanalysing publicly available mouse Treg datasets KO for FOXO1/3.
Analyses related to the knock-down of HDAC7 in human Treg cells presented in supplementary figure 6. We run differential expression analyses with DESeq2 and then run GSEA on hallmarks genesets, and geneset from the HDAC7 R150H KI mouse Treg infiltrating the brain during EAE (presented below).
Single cell RNAseq of brain infiltrating T cells at the peak of EAE in HDAC7 R150H KI mice vs WT controls, presented in Figure 5, supplementary figure 10, and supplementary figure 11. This directory is the project root for all analyses in the subdirectories.
Contains the script to run cellranger on the Fastq files and generated a count matrix for all droplets. This is a simple adaptation of the manual instructions available on the 10x genomics website, to be run on the Ruddle cluster.
It is split into 2 portions:
-
heavy analyses run on the Ruddle cluster are located in the subfolders ranging from A to D, and can be called in order by
1_run_preprocessing_pipeline.sh. The scripts called assume the directory structure on Ruddle and hence paths need to be adjusted for these to run properly. The pipeline is structured as follows:- gather results from the cell ranger output and create a seurat object (A_01)
- remove bad quality cells and potential doublets (A_02)
- Run PCA and integrate across emulsions using
harmonyto generate corrected PC loadings (B_01) and plot PC loadings (B_02) - run Louvain clustering and UMAP with a parameter search (C_01) and plot the results (C03)
- Based on plots curate clusters and run the
prestofind markers function to define cluster markers (C04) - Select T cell clusters and rerun PCA and
harmony(B_03) and plot PC loadings (B_04). - Rerun Louvain clustering and UMAP with a parameter search (C_05) and plot the results (C06)
- Based on plots curate clusters and run the
prestofind markers function to define cluster markers (C07) - Run DGE using a poisson model with the
speedglmimplementation (D). This step is heavy to run and we use therslurmpackage to parallelize the computation.
-
2_MASTER_clustering_typing.Rgathers the results from the preprocessing pipeline and plots the results of dimensionality reductions, clustering, cell typing, marker genes expression. -
3_MASTER_DGE.Rgather the results from the poisson model run in D. And plot the results.
βββ README.md
βββ human_datasets_analyses
β βββ huHDAC7_KD
β β βββ analysis
β β β βββ DGE.R # DESeq2 differential gene expression analysis
β β β βββ GSEA.R # GSEA on DESeq2 outputs, using the FOXO KO genesets
β β βββ data
β β β βββ GEO_table_STM_huKD.csv # sample table
β β β βββ foxo1.3.TregKO.txt # GEO2R results table for FOXO1/3 DKO mouse Treg
β β β βββ foxo1.TregKO.txt # GEO2R results table for FOXO1 KO Treg
β β β βββ lost-retrieved-from-affy.tsv # probe id to gene correspondance
β β βββ results
β β βββ DGE
β β β βββ huHDAC7_KD_STM.csv # DESeq2 results table
β β βββ figs
β β β βββ Foxos_genesets.pdf
β β β βββ HDAC7exp.pdf
β β β βββ Heatmap_STM.pdf
β β β βββ MAplot.pdf
β β β βββ batchedCorrectedPCA.pdf
β β β βββ gseaplots.pdf
β β β βββ gseaplots2.pdf
β β βββ raw_counts.h5 # raw count matrix uploaded to GEO
β β βββ rna # outputs from running bulk.rna.align.sh on the fastq
β β βββ quantitation
β β βββ rsem
β β βββ TS12.genes.results
β β βββ TS13.genes.results
β β βββ TS2.genes.results
β β βββ TS3.genes.results
β β βββ TS7.genes.results
β β βββ TS8.genes.results
β βββ huHDAC7_coexpression
β β βββ analysis
β β β βββ TeichmannTreg_reanalysis.R # scRNAseq of huTreg reanalysis, scoring HDAC7 module
β β β βββ wgcna_and_enrichr.R # coexpression networks in Tregs and enrichment on HDAC7 module
β β βββ results
β β βββ HDAC7moduleTreg.txt # module 1 gene name list containing HDAC7
β β βββ HDAC7moduleTreg_wID.txt # module 1 gene table containing HDAC7 with gene name and IDs
β β βββ TeichTreg
β β β βββ Teichman.Treg.skinblood.seurat.rds # exported seurat object after reanalysis
β β βββ enrichr # results from enrichr online portal analysis
β β β βββ GO_Cellular_Component_2018_table.txt
β β β βββ Transcription_Factor_PPIs_table.txt
β β βββ figs
β β β βββ GO_CC.pdf
β β β βββ HDAC7moduleScore_STM.pdf
β β β βββ HDAC7moduleScore_cluster_STM.pdf
β β β βββ Module_colors.pdf
β β β βββ PPIenrichment.pdf
β β β βββ WGCNAdendrogramTregsLarge.pdf
β β β βββ clusterUMAP_STM.pdf
β β β βββ cluster_tissue_plot.pdf
β β β βββ dotplot.pdf
β β β βββ tissueUMAP_STM.pdf
β β βββ treg.rsem.filtered.counts.txt
β βββ huHDAC7_variant
β β βββ analysis
β β β βββ DGE.R # DESeq2 differential gene expression analysis
β β β βββ GSEA.R # GSEA on hallmarks and msHDAC7 R150H KI Treg set
β β βββ data
β β β βββ GEO_table_STM_variant.csv # sample table
β β β βββ Homo_sapiens.GRCh38.88.primary_assembly.gene_symbols.txt # gene annotations
β β β βββ h.all.v6.2.symbols.gmt # hallmark genesets
β β βββ results
β β βββ DGE # DESeq2 results tables
β β β βββ degDESeqMut.txt
β β β βββ degDESeqVariant.txt
β β β βββ degDESeqWT.txt
β β βββ figs
β β β βββ HDAC7exp.pdf
β β β βββ batchedCorrectedPCA.pdf
β β β βββ diagonal_plot_2.pdf
β β β βββ gseaplots.pdf
β β β βββ gseaplots2.pdf
β β β βββ heatmap.pdf
β β β βββ huVar_vs_msVar.pdf
β β βββ raw_counts.h5 # raw count matrix uploaded to GEO
β β βββ rna # outputs from running bulk.rna.align.sh on the fastq
β β βββ quantitation
β β βββ rsem
β β βββ P13.genes.results
β β βββ P14.genes.results
β β βββ P15.genes.results
β β βββ P22.genes.results
β β βββ P23.genes.results
β β βββ P24.genes.results
β β βββ P5.genes.results
β β βββ P6.genes.results
β β βββ PP1.genes.results
β β βββ PP2R.genes.results
β β βββ PP3R.genes.results
β β βββ PP4R.genes.results
β β βββ PP5R.genes.results
β β βββ PP6R.genes.results
β β βββ PP7.genes.results
β β βββ PP8.genes.results
β β βββ PP9.genes.results
β βββ src
β βββ bulk.rna.align.sh # alignment and quantitation script. Run on Yale Ruddle cluster with SLURM scheduler
β βββ bulk_wrappers.R # custom function wrappers for DGE and GSEA
β βββ gseaplot2_PPAmod.R # custom GSEA plot function
βββ msHDAC7KI_analyses
βββ analysis
β βββ 1_run_preprocessing_pipeline.sh # preprocessing pipeline. Run on Yale Ruddle cluster with SLURM scheduler
β βββ 2_MASTER_clusturing_typing.R # Plots results for dim reduction, clustering, cluster markers
β βββ 3_MASTER_DGE.R # runs poisson model DGE on clusters of interest: Th17 and Treg clusters
β βββ A_QC # Pipeline first step: create object and QC emulsions
β β βββ A01_create_master_seurat.R
β β βββ A02_filter_cells.R
β βββ B_reductions # Pipeline second step: PCA, harmony.
β β βββ B01_PCA_harmony.R
β β βββ B02_plot_PC_loadings.R
β β βββ B03_PCA_harmony_Tcells.R
β β βββ B04_plot_PC_loadings_Tcells.R
β βββ C_clustering_typing # Pipeline 3rd step: clustering, UMAPs, cluster markers
β β βββ C01_clustering.R
β β βββ C02_clustering_oriPCA.R
β β βββ C03_clustering_plots.R
β β βββ C04_cluster_markers.R
β β βββ C05_clustering_Tcells.R
β β βββ C06_clustering_plots_Tcells.R
β β βββ C07_cluster_markers_Tcells.R
β βββ D_DGE # Pipeline 4th step: differential gene expression
β β βββ D_speedglm_poisson.R
β βββ GEO_matrix_generation.R # Generate raw matrix for GEO upload
βββ data
β βββ gene_annotations_positions.rds # gene id to gene name correspondance. With link to Seurat names (eg .1 suffixes)
β βββ sample_meta.csv # sample table
βββ figures
β βββ master
β βββ DGE # figures for DGE results on each clusters. pdf for paper figures.
β β βββ all 0_0ma_plot_ashr_fdr.png
β β βββ all 0_0p_val_histogram.png
β β βββ all 0_0volcano_plot.png
β β βββ all 0_0volcano_plot_ashr_fdr.png
β β βββ all 0_1ma_plot_ashr_fdr.png
β β βββ all 0_1p_val_histogram.png
β β βββ all 0_1volcano_plot.png
β β βββ all 0_1volcano_plot_ashr_fdr.png
β β βββ all 4_0ma_plot_ashr_fdr.png
β β βββ all 4_0p_val_histogram.png
β β βββ all 4_0volcano_plot.png
β β βββ all 4_0volcano_plot_ashr_fdr.png
β β βββ all 6_0ma_plot_ashr_fdr.png
β β βββ all 6_0p_val_histogram.png
β β βββ all 6_0volcano_plot.png
β β βββ all 6_0volcano_plot_ashr_fdr.png
β β βββ all 6_1ma_plot_ashr_fdr.png
β β βββ all 6_1p_val_histogram.png
β β βββ all 6_1volcano_plot.png
β β βββ all 6_1volcano_plot_ashr_fdr.png
β β βββ paper_MAplot_Teff.pdf
β β βββ paper_MAplot_Treg.pdf
β β βββ paper_MAplot_Tresting.pdf
β β βββ paper_MAplot_central.pdf
β βββ clusters # figure for clustering and UMAP results. main cell types and fine res cell types
β β βββ fine.png
β β βββ fine_legend.pdf
β β βββ freq.pdf
β β βββ main.png
β β βββ main_legend.pdf
β βββ marker_genes # marker gene expression overlayed on UMAPs
β βββ gene_Ass1_mean.png
β βββ gene_Bcl2_mean.png
β βββ gene_Bcl2a1a_mean.png
β βββ gene_Bcl2a1b_mean.png
β βββ gene_Bcl2a1d_mean.png
β βββ gene_Bcl2l11_mean.png
β βββ gene_Car2_mean.png
β βββ gene_Ccl1_mean.png
β βββ gene_Ccl4_mean.png
β βββ gene_Ccl5_mean.png
β βββ gene_Ccr2_mean.png
β βββ gene_Ccr7_mean.png
β βββ gene_Ccr8_mean.png
β βββ gene_Cd164_mean.png
β βββ gene_Cd226_mean.png
β βββ gene_Cd244a_mean.png
β βββ gene_Cd247_mean.png
β βββ gene_Cd274_mean.png
β βββ gene_Cd27_mean.png
β βββ gene_Cd28_mean.png
β βββ gene_Cd2_mean.png
β βββ gene_Cd3e_mean.png
β βββ gene_Cd40lg_mean.png
β βββ gene_Cd44_mean.png
β βββ gene_Cd47_mean.png
β βββ gene_Cd48_mean.png
β βββ gene_Cd4_mean.png
β βββ gene_Cd52_mean.png
β βββ gene_Cd53_mean.png
β βββ gene_Cd5_mean.png
β βββ gene_Cd69_mean.png
β βββ gene_Cd6_mean.png
β βββ gene_Cd72_mean.png
β βββ gene_Cd74_mean.png
β βββ gene_Cd7_mean.png
β βββ gene_Cd8a_mean.png
β βββ gene_Cd8b1_mean.png
β βββ gene_Cd9_mean.png
β βββ gene_Csf2_mean.png
β βββ gene_Ctla2a_mean.png
β βββ gene_Ctla4_mean.png
β βββ gene_Cxcr3_mean.png
β βββ gene_Cxcr6_mean.png
β βββ gene_Eomes_mean.png
β βββ gene_Fasl_mean.png
β βββ gene_Fcer1g_mean.png
β βββ gene_Foxp3_mean.png
β βββ gene_Gzma_mean.png
β βββ gene_Gzmb_mean.png
β βββ gene_Gzmk_mean.png
β βββ gene_Hif1a_mean.png
β βββ gene_Icos_mean.png
β βββ gene_Ifng_mean.png
β βββ gene_Ifngr1_mean.png
β βββ gene_Ifngr2_mean.png
β βββ gene_Ikzf2_mean.png
β βββ gene_Il10ra_mean.png
β βββ gene_Il12rb2_mean.png
β βββ gene_Il17a_mean.png
β βββ gene_Il1r2_mean.png
β βββ gene_Il2ra_mean.png
β βββ gene_Il2rb_mean.png
β βββ gene_Il2rg_mean.png
β βββ gene_Il7r_mean.png
β βββ gene_Itgb7_mean.png
β βββ gene_Junb_mean.png
β βββ gene_Jund_mean.png
β βββ gene_Klf2_mean.png
β βββ gene_Klrb1c_mean.png
β βββ gene_Klrc1_mean.png
β βββ gene_Klrd1_mean.png
β βββ gene_Lag3_mean.png
β βββ gene_Lef1_mean.png
β βββ gene_Lgals1_mean.png
β βββ gene_Lmo4_mean.png
β βββ gene_Ly6a_mean.png
β βββ gene_Ly6c2_mean.png
β βββ gene_Ncr1_mean.png
β βββ gene_Nfkbia_mean.png
β βββ gene_Nkg7_mean.png
β βββ gene_Nr4a1_mean.png
β βββ gene_Pdcd1_mean.png
β βββ gene_Pdcd1lg2_mean.png
β βββ gene_Ppp1r3b_mean.png
β βββ gene_Rorc_mean.png
β βββ gene_S1pr1_mean.png
β βββ gene_Sell_mean.png
β βββ gene_Tbx21_mean.png
β βββ gene_Tcf7_mean.png
β βββ gene_Tgfb1_mean.png
β βββ gene_Tigit_mean.png
β βββ gene_Tmem176b_mean.png
β βββ gene_Trdv2-2_mean.png
β βββ gene_Trdv4_mean.png
β βββ gene_Xcl1_mean.png
β βββ gene_Zbtb16_mean.png
βββ results
β βββ cellranger # results from cellranger count outputs. Only h5 files and summary stats kept
β β βββ KI6_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ KI7_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ KI8_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ KI_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ WT6_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ WT7_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ WT8_MMT_cellranger
β β β βββ filtered_feature_bc_matrix.h5
β β β βββ metrics_summary.csv
β β β βββ molecule_info.h5
β β β βββ raw_feature_bc_matrix.h5
β β β βββ web_summary.html
β β βββ WT_MMT_cellranger
β β βββ filtered_feature_bc_matrix.h5
β β βββ metrics_summary.csv
β β βββ molecule_info.h5
β β βββ raw_feature_bc_matrix.h5
β β βββ web_summary.html
β βββ master
β βββ DGE # Raw differential expression results using the poisson model with speedglm
β β βββ poisson_glm_labelPred_cpc.pdf
β β βββ poisson_glm_labelPred_it_all_0_0_meta.rds
β β βββ poisson_glm_labelPred_it_all_0_0_summary.rds
β β βββ poisson_glm_labelPred_it_all_0_1_meta.rds
β β βββ poisson_glm_labelPred_it_all_0_1_summary.rds
β β βββ poisson_glm_labelPred_it_all_4_0_meta.rds
β β βββ poisson_glm_labelPred_it_all_4_0_summary.rds
β β βββ poisson_glm_labelPred_it_all_6_0_meta.rds
β β βββ poisson_glm_labelPred_it_all_6_0_summary.rds
β β βββ poisson_glm_labelPred_it_all_6_1_meta.rds
β β βββ poisson_glm_labelPred_it_all_6_1_summary.rds
β βββ clustering_typing # results from running clustering, UMAP and findmarkers analyses
β β βββ low_res_clustering_plots_Tcells_harmony.pdf
β β βββ low_res_clustering_plots_harmony.pdf
β β βββ master_seurat_Tcells_low_res_clustering_table_harmony.rds
β β βββ master_seurat_low_res_clustering_table_harmony.rds
β β βββ presto_paired_markers_Sub_res0.05_filtered.rds
β β βββ presto_paired_markers_Tcells_curated_cluster_filtered.rds
β βββ seurat_objects # Seurat objects after clustering and UMAP computations
β βββ master_seurat_harmony_Tcells_low_res_clustered.rds
β βββ master_seurat_harmony_low_res_clustered.rds
βββ src
β βββ cellranger_count.sh # cellranger count script. Run on Yale Ruddle cluster with SLURM scheduler
βββ tables # Final DGE results tables
βββ Treg_poisson_glm_DGE_results.csv
βββ central_Tconv_poisson_glm_DGE_results.csv
βββ effector_Tconv_poisson_glm_DGE_results.csv
βββ resting_Tconv_poisson_glm_DGE_results.csv