Skip to content

NCI-GDC/gdc-mutation-indexer

Repository files navigation

GDC Mutation Index Export

Note

The code in this repository has been made public as-is for informational purposes. The repository may use private resources for the building and execution of the code. For example, private registries may be used for dependency resolution.

The documentation may refer to restricted URLs.

pre-commit

Backend for exporting mutation indices for visualization on the GDC

Contents

Architecture

Indexer Architecture

Mutation indexer is an ETL platform leveraging Spark/Pyspark. It combines data from data derived from the GDC graph (via the graph indices), static data (e.g. gene model), and data contained in analysis files (e.g. MAF and ASCAT files) in order to create structured data which can be used for visualization and further analysis.

Style Guide

Mutation Indexer strives to follow the best practices set forth here: pyspark-style-guide

Pre-Commit

We use pre-commit to setup pre-commit hooks for this repo. We use detect-secrets to search for secrets being committed into the repo.

To install the pre-commit hook, run

pre-commit install

To update the .secrets.baseline file run

detect-secrets scan --update .secrets.baseline
git add .secrets.baseline

.secrets.baseline contains all the string that were caught by detect-secrets but are not stored in plain text. Audit the baseline to view the secrets .

detect-secrets audit .secrets.baseline

Update GDC Models

After the dictionary is updated, any changes to the case structures in the various indices should be reflected in a new version of the gdc-models package. In order to ensure that Mutation Indexer is building these new mappings, we need to update it with the latest version. We also need to ensure that our test data continues to reflect the expected inputs from the graph/case index and that derived components of the viz mappings are built accordingly. To do this, we have several scripts within the Mutation Indexer test suite which ease this process and an orchestration script which should generally handle most updates. This said, its is encouraged for users to explore these test scripts and their individual functionality as they can be helpful in other stages of development in Mutation Indexer; please see tools. Below is the command for running the orchestration script for general updates to the models.

Command:

# Make sure you have sourced your venv!
bin/update_models.sh

About

Mutation index exports for GDC visualization

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors