Welcome to Bioinformatics Engineering, a Stanford 3-week mini course on bioinformatics for everyone
Modern biological research generates unprecedented volumes of data across multiple modalities, from multi-omics and high-resolution imaging to time-series measurements. This hands-on course equips students with essential computational engineering skills for large-scale biological data analysis. Students learn to build reproducible and scalable computational pipelines with technologies such as git, mamba, singularity, nextflow, sql, hdf5, and cloud storage. Students apply these concepts by completing two comprehensive projects that bridge genomic pipeline development and machine learning applications in biology.
Prerequisite
- Basic Programming
- Basic Biology/Genomics
| Lecture | Date | Topic | Material |
|---|---|---|---|
| 1 | 11/11 | Overview + Setups | Setup.md |
| 2 | 11/13 | Environment: mamba, container, VSCode, Jupyter | Environment.md |
| 3 | 11/18 | Data: h5py, SQL, BigQuery, GCS | Data.md |
| 4 | 11/20 | Pipeline: Git, SLURM array, Nextflow | Pipeline.md |
| 5 | 12/02 | Project 1: Genomics Pipeline | Project1.md |
| 6 | 12/04 | Project 2: Machine Learning | Project2.md |
| Lecture | Topic | Material |
|---|---|---|
| 1 | Overview | Bash, SLURM |
| 2 | Environment | Vim, Markdown |
| 3 | Data | SQL, HDF5, numpy, pandas |
| 4 | Pipeline | Bioinformatics Pipeline Review |
| 5 | Project 1 | Genome Annotation: Bakta |
| 6 | Project 2 | Machine Learning Review |