-
Notifications
You must be signed in to change notification settings - Fork 1
Setup run hpc #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Setup run hpc #119
Changes from 28 commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
42effdb
add scripts to process raw dataset
ghar1821 83ceda2
editing config to set apptainer cache dir
ghar1821 f3019a9
editing pre-run scripts and trying to fix R methods not running.
ghar1821 f664c48
add h5py to setup
ghar1821 338aa23
reverting changes to setup
ghar1821 f41f6f4
separate submit scripts
ghar1821 66c348e
finally the first setting that works!!!!
ghar1821 fa1ade7
update config and settings for control methods
ghar1821 5902468
adjusted resources for metrics and methods
ghar1821 7497ea4
update cytovi to use A30 gpu
ghar1821 cf3d35b
add numba cache dir export to allow jit caching
ghar1821 35e3fdb
update cytovi implementation
ghar1821 bad078d
force recompute for all cytonorm
ghar1821 bdcbf46
add temp dir resolution for hpc
ghar1821 7bada43
remove transpose from harmonypy
ghar1821 6793f3d
adding support for hpc
ghar1821 aa4b07a
update temp dir again
ghar1821 44def10
latest config file that works reasonably well with hpc
ghar1821 fc4df26
add some job submit scripts for SLURM
ghar1821 bcc7ddb
update tmp_path for cytonorm
ghar1821 9dae0b6
redirect numba cache dir away from /tmp and to its own folder.
ghar1821 379b3dd
update batch adjust non control samples naming
ghar1821 d062157
fix bug in perfect integration subsetting
ghar1821 7627554
fix bug where we can't replace the batch column if it is not integer
ghar1821 5ff088b
fix bug where the donor loc are somewhat mismatched..
ghar1821 2864b9c
update ratio inconsistent peak where corrected data return only zero
ghar1821 ddc57cf
Update script.py
ghar1821 2312cb2
update scripts
ghar1821 8acaafc
Merge branch 'main' into setup_run_hpc
ghar1821 4e807f1
remove average batch r2 global
ghar1821 03cc959
add seed setting for cytovi
ghar1821 ca9ac0a
remove env for viash temp files
ghar1821 fa20205
update lisi to allow anndata write
ghar1821 04a4280
update cycombine
ghar1821 f337c99
more updates to cycombine
ghar1821 236fec8
minor change of script type
ghar1821 0706945
update cytonorm
ghar1821 f75d01c
fixed gaussnorm
ghar1821 11cab3e
fixed limma
LuLeom b1592c4
Fixed harmonypy and combat
LuLeom f4bff8d
Fixed rPCA
LuLeom facc520
update batchadjust and add copy to subset
ghar1821 f27dad7
remove cytovi and some obsolete metrics
ghar1821 4eaeaac
renamed shuffle control methods
ghar1821 0e11692
missed label change
ghar1821 1e2d508
reorganising scripts for hpc
ghar1821 06a9069
update changelog
ghar1821 f444132
update changelog again
ghar1821 ebb18a0
update changelog
ghar1821 e3d8951
update changelog
ghar1821 f2d073e
update description.
ghar1821 3e3af60
manually adding some dependencies for flowCore and flowStats
ghar1821 10c2c60
update ratio inconsistent peaks
ghar1821 999ce87
update inconsistent peaks
ghar1821 558cd5c
add print statements to subset functions
ghar1821 de15e28
add print statements when writing files out
ghar1821 6210259
add utility scripts for pulling intermediate files
ghar1821 9680aa3
update methods and metrics labels
ghar1821 e8e1339
fix bug where subsetting was not done on ilisi and fsom mapping metrics
ghar1821 83fe60b
Update CHANGELOG.md
ghar1821 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| #!/bin/bash | ||
|
|
||
| # script to launch the process raw dataset workflow on slurm via seqera tower. | ||
| # leave the input_states to s3 bucket as the datasets raw files are stored there. | ||
|
|
||
| cat > /tmp/params.yaml << 'HERE' | ||
| input_states: s3://openproblems-data/resources/task_cyto_batch_integration/datasets_raw/**/state.yaml | ||
| rename_keys: 'input:output_dataset' | ||
| output_state: '$id/state.yaml' | ||
| settings: '{"output_unintegrated": "$id/unintegrated.h5ad", "output_censored_split1": "$id/censored_split1.h5ad", "output_censored_split2": "$id/censored_split2.h5ad"}' | ||
| publish_dir: /vast/scratch/users/putri.g/cytobenchmark/benchmark_out_hpc/datasets/ | ||
| HERE | ||
|
|
||
| tw launch https://github.com/openproblems-bio/task_cyto_batch_integration.git \ | ||
| --revision build/main \ | ||
| --pull-latest \ | ||
| --main-script target/nextflow/workflows/process_datasets/main.nf \ | ||
| --workspace 80689470953249 \ | ||
| --params-file /tmp/params.yaml \ | ||
| --entry-name auto \ | ||
| --config scripts/labels_tw_wehi.config \ | ||
| --labels task_cyto_batch_integration,process_datasets |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| def exitStrat(task, max_attempts = 3) { | ||
| println "Determining exit strategy for task (attempt '${task.attempt}', exit status '${task.exitStatus}')" | ||
|
|
||
| // if the component failed 3 times, ignore the error so the workflow can continue | ||
| // it's important 'ignore' is returned even if maxRetries is set to 3, | ||
| // otherwise the workflow will stop | ||
| if (task.attempt >= 3) { | ||
| return 'ignore' | ||
| } | ||
|
|
||
| return 'retry' | ||
| } | ||
|
|
||
| // Let Nextflow head job manages the Apptainer containers | ||
| apptainer { | ||
| enabled = true | ||
| pullTimeout = '48h' | ||
| ociAutoPull = false | ||
| cacheDir = '/vast/scratch/users/putri.g/nextflow/apptainer_cache' | ||
| envWhitelist = 'APPTAINER_CACHEDIR,APPTAINER_TMPDIR,SINGULARITY_CACHEDIR,SINGULARITY_TMPDIR,TMPDIR,NXF_HOME,NXF_TEMP,NXF_APPTAINER_CACHEDIR,PYTHONPATH,NUMBA_CACHE_DIR,NUMBA_DISABLE_JIT,HPC_VIASH_META_TEMP_DIR' | ||
| } | ||
|
|
||
| env { | ||
| NXF_APPTAINER_CACHEDIR = '/vast/scratch/users/putri.g/nextflow/apptainer_cache' | ||
| APPTAINER_CACHEDIR = '/vast/scratch/users/putri.g/nextflow/apptainer_cache' | ||
| APPTAINER_TMPDIR = '/vast/scratch/users/putri.g/nextflow/apptainer_tmp' | ||
| SINGULARITY_CACHEDIR = '/vast/scratch/users/putri.g/nextflow/apptainer_cache' | ||
| SINGULARITY_TMPDIR = '/vast/scratch/users/putri.g/nextflow/apptainer_tmp' | ||
| NXF_HOME = '/vast/scratch/users/putri.g/nextflow/nxf_home' | ||
| PYTHONPATH = '/root/.local/lib/python3.12/site-packages' | ||
| // Add Numba environment variables to fix caching issues in containers | ||
| NUMBA_DISABLE_JIT = '0' | ||
| } | ||
|
|
||
| process { | ||
| beforeScript = ''' | ||
| # Create base directories (shared across tasks) | ||
| mkdir -p "$APPTAINER_CACHEDIR" "$NXF_HOME" "$HOME" | ||
|
|
||
| # Create task-specific temp directories | ||
| export TMPDIR="/vast/scratch/users/putri.g/nextflow/apptainer_tmp/${NXF_TASK_INDEX:-$$}" | ||
| export APPTAINER_TMPDIR="${TMPDIR}" | ||
| export SINGULARITY_TMPDIR="${TMPDIR}" | ||
| export NXF_TEMP="/vast/scratch/users/putri.g/nextflow/nxf_tmp/${NXF_TASK_INDEX:-$$}" | ||
| export HPC_VIASH_META_TEMP_DIR="${NXF_TEMP}" | ||
| export NUMBA_CACHE_DIR="/vast/scratch/users/putri.g/nextflow/numba_cache/${NXF_TASK_INDEX:-$$}" | ||
|
|
||
| mkdir -p "$TMPDIR" "$NXF_TEMP" "$NUMBA_CACHE_DIR" | ||
|
|
||
| echo "=============================" | ||
| echo "Task-specific directories:" | ||
| echo "=============================" | ||
| echo " TMPDIR: $TMPDIR" | ||
| echo " APPTAINER_TMPDIR: $APPTAINER_TMPDIR" | ||
| echo " SINGULARITY_TMPDIR: $SINGULARITY_TMPDIR" | ||
| echo " NXF_TEMP: $NXF_TEMP" | ||
| echo " HPC_VIASH_META_TEMP_DIR: $HPC_VIASH_META_TEMP_DIR" | ||
| echo " NUMBA_CACHE_DIR: $NUMBA_CACHE_DIR" | ||
| echo "=============================" | ||
| echo "Shared directories:" | ||
| echo "=============================" | ||
| echo " APPTAINER_CACHEDIR: $APPTAINER_CACHEDIR" | ||
| echo " NXF_APPTAINER_CACHEDIR: $NXF_APPTAINER_CACHEDIR" | ||
| echo " NXF_HOME: $NXF_HOME" | ||
| '''.stripIndent() | ||
| } | ||
|
|
||
|
|
||
| process { | ||
| executor = 'slurm' | ||
|
|
||
| // Default resources for all processes | ||
| cpus = 4 | ||
| memory = { get_memory( 10.GB * task.attempt ) } | ||
| time = '48.h' | ||
| disk = 50.GB | ||
| queue = 'regular' | ||
|
|
||
| // Retry for exit codes that have something to do with memory issues | ||
| // always retry once | ||
| errorStrategy = { exitStrat(task) } | ||
| maxRetries = 3 | ||
| maxMemory = null | ||
|
|
||
| // Resource labels | ||
| withLabel: lowcpu { cpus = 5 } | ||
| withLabel: midcpu { cpus = 15 } | ||
| withLabel: highcpu { cpus = 30 } | ||
| withLabel: lowmem { memory = { get_memory( 10.GB * task.attempt ) } } | ||
| withLabel: midmem { memory = { get_memory( 30.GB * task.attempt ) } } | ||
| withLabel: highmem { memory = { get_memory( 80.GB * task.attempt ) } } | ||
| withLabel: veryhighmem { memory = { get_memory( 150.GB * task.attempt ) } } | ||
| withLabel: lowtime { time = 2.h } | ||
| withLabel: midtime { time = 8.h } | ||
| withLabel: hightime { time = 12.h } | ||
| withLabel: veryhightime { time = 24.h } | ||
| withLabel: lowsharedmem { | ||
| containerOptions = { workflow.containerEngine != 'singularity' ? "--shm-size ${String.format("%.0f",task.memory.mega * 0.05)}" : ""} | ||
| } | ||
| withLabel: midsharedmem { | ||
| containerOptions = { workflow.containerEngine != 'singularity' ? "--shm-size ${String.format("%.0f",task.memory.mega * 0.1)}" : ""} | ||
| } | ||
| withLabel: highsharedmem { | ||
| containerOptions = { workflow.containerEngine != 'singularity' ? "--shm-size ${String.format("%.0f",task.memory.mega * 0.25)}" : ""} | ||
| } | ||
| withLabel: gpu { | ||
| cpus = 16 | ||
| clusterOptions = '--gres=gpu:A30:1' | ||
| queue = "gpuq" | ||
| containerOptions = { workflow.containerEngine == "singularity" ? '--nv': | ||
| ( workflow.containerEngine == "docker" ? '--gpus all': null ) } | ||
| } | ||
| withLabel: midgpu { | ||
| cpus = 32 | ||
| clusterOptions = '--gres=gpu:A30:4' | ||
| queue = "gpuq" | ||
| containerOptions = { workflow.containerEngine == "singularity" ? '--nv': | ||
| ( workflow.containerEngine == "docker" ? '--gpus all': null ) } | ||
| } | ||
| withLabel: highgpu { | ||
| cpus = 64 | ||
| clusterOptions = '--gres=gpu:A30:8' | ||
| queue = "gpuq" | ||
| containerOptions = { workflow.containerEngine == "singularity" ? '--nv': | ||
| ( workflow.containerEngine == "docker" ? '--gpus all': null ) } | ||
| } | ||
| withLabel: biggpu { | ||
| cpus = 16 | ||
| clusterOptions = '--gres=gpu:A100:1' | ||
| queue = "gpuq" | ||
| containerOptions = { workflow.containerEngine == "singularity" ? '--nv': | ||
| ( workflow.containerEngine == "docker" ? '--gpus all': null ) } | ||
| } | ||
|
|
||
| // make sure publishstates gets enough disk space and memory | ||
| withName:'.*publishStatesProc' { | ||
| memory = '16GB' | ||
| disk = '100GB' | ||
| } | ||
| } | ||
|
|
||
| def get_memory(to_compare) { | ||
| if (!process.containsKey("maxMemory") || !process.maxMemory) { | ||
| return to_compare | ||
| } | ||
|
|
||
| try { | ||
| if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) { | ||
| return process.maxMemory | ||
| } | ||
| else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) { | ||
| return max_memory as nextflow.util.MemoryUnit | ||
| } | ||
| else { | ||
| return to_compare | ||
| } | ||
| } catch (all) { | ||
| println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!" | ||
| System.exit(1) | ||
| } | ||
| } | ||
|
|
||
| // set tracing file | ||
| trace { | ||
| enabled = true | ||
| overwrite = true | ||
| file = "${params.publish_dir}/trace.txt" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # paste me as pre-run script in Tower if setting up workflow run in WEHI HPC. | ||
| # load module first so the variables don't get overwritten | ||
| module load nextflow/25.04.2 | ||
|
|
||
| # Tower pre-run script | ||
| export SHARED_SCRATCH="/vast/scratch/users/putri.g/nextflow" | ||
|
|
||
| export NXF_APPTAINER_CACHEDIR="$SHARED_SCRATCH/apptainer_cache" | ||
| export APPTAINER_CACHEDIR="$SHARED_SCRATCH/apptainer_cache" | ||
| export APPTAINER_TMPDIR="$SHARED_SCRATCH/apptainer_tmp" | ||
| export APPTAINER_LIBRARYDIR="$SHARED_SCRATCH/apptainer_library" | ||
| export SINGULARITY_CACHEDIR="$SHARED_SCRATCH/apptainer_cache" | ||
| export SINGULARITY_TMPDIR="$SHARED_SCRATCH/apptainer_tmp" | ||
| export TMPDIR="$SHARED_SCRATCH/apptainer_tmp" | ||
| export NXF_HOME="$SHARED_SCRATCH/nxf_home" | ||
| export NXF_TEMP="$SHARED_SCRATCH/nxf_tmp" | ||
| export HOME="$SHARED_SCRATCH/home" | ||
|
|
||
| mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR" "$APPTAINER_LIBRARYDIR" "$NXF_HOME" "$NXF_TEMP" "$HOME" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| #!/bin/bash | ||
|
|
||
| # get the root of the directory | ||
| REPO_ROOT=$(git rev-parse --show-toplevel) | ||
|
|
||
| # ensure that the command below is run from the root of the repository | ||
| cd "$REPO_ROOT" | ||
|
|
||
| set -e | ||
|
|
||
| # generate a unique id | ||
| RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)" | ||
| publish_dir="/vast/scratch/users/putri.g/cytobenchmark/benchmark_out_hpc/results/${RUN_ID}" | ||
|
|
||
| # write the parameters to file | ||
| cat > /tmp/params.yaml << HERE | ||
| input_states: /vast/scratch/users/putri.g/cytobenchmark/benchmark_out_hpc/datasets/**/state.yaml | ||
| rename_keys: 'input_censored_split1:output_censored_split1;input_censored_split2:output_censored_split2;input_unintegrated:output_unintegrated' | ||
| output_state: "state.yaml" | ||
| publish_dir: "$publish_dir" | ||
| HERE | ||
|
|
||
| tw launch https://github.com/openproblems-bio/task_cyto_batch_integration.git \ | ||
| --revision build/setup_run_hpc \ | ||
| --pull-latest \ | ||
| --main-script target/nextflow/workflows/run_benchmark/main.nf \ | ||
| --workspace 80689470953249 \ | ||
| --params-file /tmp/params.yaml \ | ||
| --entry-name auto \ | ||
| --config scripts/labels_tw_wehi.config \ | ||
| --labels task_cyto_batch_integration,full |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| #!/bin/bash | ||
|
|
||
| # run script to run only subset of methods/metrics on HPC | ||
|
|
||
| # get the root of the directory | ||
| REPO_ROOT=$(git rev-parse --show-toplevel) | ||
|
|
||
| # ensure that the command below is run from the root of the repository | ||
| cd "$REPO_ROOT" | ||
|
|
||
| set -e | ||
|
|
||
| # generate a unique id | ||
| RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)" | ||
| publish_dir="/vast/scratch/users/putri.g/cytobenchmark/benchmark_out_hpc/results/${RUN_ID}" | ||
|
|
||
| # write the parameters to file | ||
| cat > /tmp/params.yaml << HERE | ||
| input_states: /vast/scratch/users/putri.g/cytobenchmark/benchmark_out_hpc/datasets/**/state.yaml | ||
| rename_keys: 'input_censored_split1:output_censored_split1;input_censored_split2:output_censored_split2;input_unintegrated:output_unintegrated' | ||
| output_state: "state.yaml" | ||
| settings: '{"metrics_include": ["lisi"], "methods_include": ["combat"]}' | ||
| publish_dir: "$publish_dir" | ||
| HERE | ||
|
|
||
| tw launch https://github.com/openproblems-bio/task_cyto_batch_integration.git \ | ||
| --revision build/setup_run_hpc \ | ||
| --pull-latest \ | ||
| --main-script target/nextflow/workflows/run_benchmark/main.nf \ | ||
| --workspace 80689470953249 \ | ||
| --params-file /tmp/params.yaml \ | ||
| --entry-name auto \ | ||
| --config scripts/labels_tw_wehi.config \ | ||
| --labels task_cyto_batch_integration,combat,test |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computationally expensive maybe(?)