- Expand test suite to include function unittests (maximize coverage)
- Replace requirement for local foldseek installation in extract-3di
- Replace all fasta parsing with biotite.sequence.io.fasta
- Introduce
predict_blockto support multi-GPU prediction of protein interactions with semi-on-the-fly embedding loading, resolving #6 and #11 - Add sparse embedding loading to
predict_block, in which only the required subset of each block of embeddings is loaded at each time when predicing only some pairs - Add
predict_bipartitefor predicting interactions between all proteins from two disjoint sets, e.g., proteins from two species. This implementation is a bit more efficient than usingpredict_blockwith cross-set pairs specified. - Improve method for loading embeddings
- Modernized D-SCRIPT repository
- Significantly updated unittest coverage
- Migrate to loguru under the hood
- Update pyproject, GitHub actions, and other continuous integration/installation
- Linting and formatting with Ruff
- Resolve #35 to use
require_dataset-- can now add multiple .fasta files to the same h5 file - Update pretrained API and docs to include Topsy-Turvy
- Add retry decorator to get_pretrained if download fails
- Add ability to set a random seed for training
- Update
evaluatecode to also store metrics in a file
- Add biopython to setup.py
- Integrate Topsy-Turvy to allow for top-down supervision
- Use utils.log function across all commands
- Speed up loading embeddings into memory using parallel processing
- Update fasta parse and write to use BioPython SeqIO (better error checking)
- More comprehensive test suite for main commands
- Updated model loading on new version to handle re-named parameters
- Updated cpu-only loading during prediction with map_location
- Resolve #24 by fixing training
- Can now run
dscript train --train data/pairs/human_train.tsv --test data/pairs/human_test.tsv --embedding /afs/csail/u/s/samsl/Work/databases/STRING/homo.sapiens/human_nonRed.h5 --output [output] --save-prefix [prefix] --device 0to replicate paper results - Updated code formatting with black and pre-commit
- Following previous update, addresses #24 by fixing model training while maintaining preferred API and command line usage
- Fixed significant bug in how training was run by reverting to older code
- Should address issue #24: unable to replicate paper results
- To do: code cleaning to bring up to formatting standards while maintaining performance
- Augmentation fix in v0.1.5 was bugged still and would throw an error, now resets index
- Change
--use-wand--augmentto--no-wand--no-augmentwith store false
- Updated package level imports
- Updated documentation
- Fixed issue #13: improper augmentation of data
- Fixed issue #12: overwrites cmap data sets if they already exist
- Fixed issue #7: bug which would crash contact module if called directly
- Fixed issues #3, #4
- Basic logging system implemented to report skipped pairs
- Fixed wrong variable name in loading from sequence file
- Updated documentation
- Model should be put into
eval()mode before prediction or evaluation, and when new models are downloaded - this makes the output deterministic by disabling dropout layers