Sydney-Informatics-Hub · calizilla · Apr 15, 2026 · Feb 24, 2026 · Feb 25, 2026 · Feb 26, 2026
diff --git a/Example_output b/Example_output
diff --git a/README.md b/README.md
@@ -9,21 +9,106 @@ This repository contains scripts to pull resource usage data from job logs into
 
 ### NCI Gadi HPC 
 
-**[gadi-usage-report.pl](Scripts/gadi_usage_report.pl)**
+**[gadi_usage_report_v1.2.pl](Scripts/gadi_usage_report_v1.2.pl)**
+
+Description: 
+
+This script gathers the job compute requests and usage metrics from Gadi PBS logs and summarises them into a tab-delimited output. 
+
+Efficiency/utilisation values are reported for CPU using the formula `cpu_e = cputime/walltime/cpus_used`.
+
+GPU usage (NGPUS, memory used, and GPU utilisation) can be optionally reported by appliny `-g` flag to the run.
+
+
+Options:
 
-This script gathers the job requests and usage metrics from Gadi log files for a collection of job log files with the same prefix within the same directory, and calculates efficiency values using the formula:
 ```
-e = cputime/walltime/cpus_used
+  -a <dir>      Report on all .o log files in the specified directory
+  -l <logfile>  Report on one exact logfile
+  -p <pattern>  Report on .o log files matching a filename pattern
+  -g            Include GPU metrics
+```
+
+At least one of `-a <val>`, `-l <val>` or `-p <val> must be supplied. 
+
+GPU metrics can be included with any of the above 3 parameters with the optional `-g` flag. Logs with no GPU usage will have `NA` for the 3 GPU output fields. 
+
+Usage examples:
+
+```bash
+perl gadi_usage_report_v1.2.pl -a /path/to/logdir # all logs in dir
+perl gadi_usage_report_v1.2.pl myjob.o -g # a specific log, report GPU usage
+perl gadi_usage_report_v1.2.pl name # all logs with name including 'name'
 ```
 
-If no prefix is specified, a warning wil be given, and the usage metrics will be reported for all job logs found within the present directory. Please see script header for execution instructions.
+Output:
+
+Tab-delimited summary of the resources requested and used for each job will be printed to STDOUT. 
+
+Use output redirection when executing the script to save the data to a text file, eg:
 
-**[gadi-queuetime-report.pl](Scripts/gadi_queuetime_report.pl)** 
+`perl <path/to/script/gadi_usage_report_v1.2.pl <options> > resources_summary.txt`
+
+If no prefix is specified, a warning wil be given, and the usage metrics will be reported for all job logs found within the present directory. 
 
-This script reports the queue time of a collection of completed jobs with the same output log file prefix on Gadi. If no prefix is specified, a warning will be given, and the queue time will be reported for all jobs with logs found within the present directory. Please note that PBS does not preserve job history on Gadi past 24 hours post job-completion.
+Example output:
 
-In order to remove this time restriction, jobs can be submitted with the line `qstat -xf $PBS_JOBID`` anywhere in the job script, with or without output redirection. This will preserve the required record in the ".o" output log file (no output redirection) or on a separate file (with output redirection). There are THREE ways in which this script can be run. Please see script header for execution instructions.
+```console
+perl ./HPC_usage_reports/Scripts/gadi_usage_report_v1.2.pl -a /scratch/aa00/my-pbs-logs/ -g
+
+######
+Reporting on all usage log files in /scratch/aa00/my-pbs-logs/.
+######
+
+#JobName        Exit_status     Service_units   CPU_efficiency  CPUs    GPU_util        NGPUS   Mem_req Mem_used        GPU_mem_used    CPUtime_mins    Walltime_req    Walltime_mins   JobFS_req       JobFS_used Date
+hg38_1140_test_three_cpu_only.o    0       8.28    0.14    12      NA      NA      48.0GB  14.99GB NA      11.88   00:10:00        6.90    100.0MB 0B      2026-03-19
+dgxa100_4pod5drs_2ngpu.o        0       64.40   0.15    64      0.83    4       1000.0GB        34.71GB 312.89GB        130.72  00:30:00        13.42   200.0GB 0B      2026-04-13
+gpuhopper_4pod5drs_2ngpu.o      0       41.20   0.36    24      0.74    2       480.0GB 33.39GB 173.62GB        117.37  00:30:00        13.73   200.0GB 0B      2026-04-12
+gpuhopper_4pod5drs_4ngpu.o      0       115.60  0.21    48      0.11    4       1.0TB   35.7GB  372.48GB        190.73  00:30:00        19.27   200.0GB 0B      2026-04-13
+gpuvolta_4pod5drs_2ngpu.o       0       113.84  0.19    48      0.83    4       382.0GB 32.98GB 91.47GB 431.80  01:00:00        47.43   200.0GB 0B      2026-04-13
+
+```
 
 **[gadi-nfcore-report.sh](Scripts/gadi_nfcore_report.sh)**
 
 This script gathers the job requests and usage metrics from Gadi log files, same as [gadi-queuetime-report.pl](Scripts/gadi-queuetime-report.pl). However, this script loops through the Nextflow work directory to collect `.commmand.log` files and prints all output to a .tsv file: `gadi-nf-core-joblogs.tsv`
+
+**[gadi_nextflow_usage_v1.1.sh](Scripts/gadi_nextflow_usage_v1.1.sh)**
+
+This script takes a nextflow run name (e.g. from `nextflow log`), pulls out all the task hashes from the run, and finds the relevant work directory to collect `.command.log` files from that run only. The script gathers the job requests and usage metrics from Gadi post-job files similar to [gadi-queuetime-report.pl](Scripts/gadi-queuetime-report.pl), and 
+[gadi-nfcore-report.sh](Scripts/gadi-queuetime-report.pl). 
+
+Results are printed to file: `resource_usage.<nextflow_run_name>.log`.
+
+The script takes requires the nextflow run name as first and only positional argument. If you have forgotten the run name, identify it from the output of the `nextflow log` command (most recent run name is printed closest to command prompt): 
+
+```bash
+module load nextflow
+nextflow log 
+```
+
+```console
+TIMESTAMP               DURATION        RUN NAME                STATUS  REVISION ID     SESSION ID                              COMMAND                  
+
+2026-02-25 11:51:55     -               kickass_cantor          -       593881520d      e2ddc027-c09f-487c-a241-be9771114df6    nextflow run main.nf ...
+2026-02-25 11:54:03     50m 35s         loving_boltzmann        ERR     593881520d      e2ddc027-c09f-487c-a241-be9771114df6    nextflow run main.nf ...
+2026-02-25 13:07:06     5h 34m 53s      maniac_lorenz           OK      593881520d      e2ddc027-c09f-487c-a241-be9771114df6    nextflow run main.nf ...
+```
+
+Run the script:
+
+```bash
+bash Scripts/gadi_nextflow_usage.sh maniac_lorenz
+```
+
+Example output:
+
+```console
+Job_name        Hash    Log_path        Exit_status     Service_units   NCPUs_requested CPU_time_used(mins)     CPU_efficiency  Memory_requested        Memory_used     Walltime_requested      Walltime_used(mins)JobFS_requested  JobFS_used
+PREPARE_GENOME:INDEX_MINIMAP2 (T2T)	68/7bbdc7	../work/68/7cbdc706bba77935ff576939e5478a/.command.log	0	0.19	4	2.42	0.4315	16.0GB	16.0GB	0:30:00	1.4	100.0MB	0B
+PREPARE_GENOME:BUILD_BED12 (T2T)	a1/bad978	../work/a1/bad9785fc4775f110218ef1a350609/.command.log	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
+PREPARE_GENOME:INDEX_SAMTOOLS (T2T)	8d/3dd9b9	../work/8d/32d9b9ebe7d3993d8cc952f15b75e0/.command.log	0	0.09	12	0.15	0.0577	48.0GB	3.1GB	1:00:00	0.22	100.0MB	0B
+MAPPING:MINIMAP2_MAP_SORT_INDEX (15022)	7a/2e38bd	../work/7a/2e38bd6b0dec1dc227f8de0b90d389/.command.log	0	33.71	24	608.33	0.6016	96.0GB	84.49GB	6:00:00	42.13	100.0MB	0B
+MAPPING:MINIMAP2_MAP_SORT_INDEX (15022)	f6/d611b9	../work/f6/d611b993ae5fcf51ca6c0307425c98/.command.log	0	55.71	24	1066.82	0.6384	96.0GB	78.22GB	6:00:00	69.63	100.0MB	0B
+BAM_QC (BAM QC: 15022)	1d/a2c7f5	../work/1d/a2c7f512627c840726d4c7375df4f2/.command.log	0	7.11	12	41.68	0.1953	48.0GB	48.0GB	1:00:00	17.78	100.0MB	0B
+```
diff --git a/Scripts/gadi_usage_report.pl → Scripts/Archive/gadi_usage_report.pl b/Scripts/gadi_usage_report.pl → Scripts/Archive/gadi_usage_report.pl
diff --git a/Scripts/Archive/gadi_usage_report_v1.2.pl b/Scripts/Archive/gadi_usage_report_v1.2.pl
@@ -0,0 +1,123 @@
+#!/usr/bin/env perl
+
+#------------------------------------------------------------------
+# gadi_usage_report/1.1 
+# Platform: NCI Gadi HPC
+#
+# Description: 
+# This script gathers the job requests and usage metrics from Gadi log 
+# files for a collection of job log files with the same prefix within the
+# same directory, and calculates efficiency values using the formula 
+# e = cputime/walltime/cpus_used.
+# # If no prefix is specified, a warning wil be given, and the usage metrics
+# will be reported for all job logs found within the present directory.
+#
+# Version 1.1 updates
+# Reports usage for all logs in /path/to/dir or for logs specified
+# Faster, by only checking end of log (was slow for logs with big
+# stdout)
+# Reports job exit status
+# Reports files with no usage log
+#
+# Usage:
+# command line, eg:
+# perl gadi_usage_report_v1.1.pl /path/to/logdir
+# perl gadi_usage_report_v1.1.pl myjob.o
+#
+# Output:
+# Tab-delimited summary of the resources requested and used for each job 
+# will be printed to STDOUT. Use output redirection when executing the 
+# script to save the data to a text file, eg:
+# perl <path/to/script/gadi_usage_report.pl <prefix> > resources_summary.txt  
+#
+# Date last modified: 13/04/26
+# Version 1.2 updates: 
+# - reorder headings now the log is so long, to bring VIP details to fore
+# - remove time, to reduce log complexity
+# - fixed new failure from NCI droopping redundant CPU field from PBS .o log
+# - added usage option to do all logs matching pattern word:
+#    perl gadi_usage_report_v1.1.pl myjob # will do all logs in dir with name containing 'myjob'
+#
+# If you use this script towards a publication, please acknowledge the
+# Sydney Informatics Hub (or co-authorship, where appropriate).
+#
+# Suggested acknowledgement:
+# The authors acknowledge the scientific and technical assistance 
+# <or e.g. bioinformatics assistance of <PERSON>> of Sydney Informatics
+# Hub and resources and services from the National Computational 
+# Infrastructure (NCI), which is supported by the Australian Government
+# with access facilitated by the University of Sydney.
+#------------------------------------------------------------------
+
+use warnings;
+use strict;
+use POSIX; 
+use File::Basename;
+
+my $dir=`pwd`;
+chomp $dir; 
+my @logs;
+my @no_report;
+
+my $prefix = ''; 
+if ($ARGV[0]) {
+    $prefix = $ARGV[0];
+    chomp $prefix;
+    if ($ARGV[0] =~ m/.o$/) {
+        @logs = (`ls "$prefix"`);  
+    }
+    else {
+        @logs = split(' ', `ls $dir\/*$prefix*.o`);  
+    }	
+}
+else {
+	print "\n######\nNo usage log prefix specified. Will report on all usage log files in $dir.\n######\n\n";
+	@logs=split(' ', `ls $dir\/*.o`); 
+} 
+
+my $report={};
+
+if (@logs){
+	print "#JobName\tExit_status\tService_units\tCPU_efficiency\tCPUs\tMem_req\tMem_used\tCPUtime_mins\tWalltime_req\tWalltime_mins\tJobFS_req\tJobFS_used\tDate\n";
+
+	foreach my $file (@logs) {
+        chomp $file;  
+		my @name_fields = split('\/', $file);
+		my $name=basename($file);
+		my @walltime = split(' ', `tail -12 $file | grep "Walltime"`); 
+		if($walltime[2]){
+			my $walltime_req = $walltime[2];
+			my $walltime_used = $walltime[5];
+			my ($wall_hours, $wall_mins, $wall_secs) = split('\:', $walltime_used); 
+			my $walltime_mins = sprintf("%.2f",(($wall_hours*60) + $wall_mins + ($wall_secs/60)));
+			my @cpus = split(' ', `tail -12 $file | grep -i "NCPUs"`); 
+			my $cpus = $cpus[2];	
+			my @mem = split(' ', `tail -n 12 $file | grep -i "Memory"`);	
+			my $mem_req = $mem[2];
+			my $mem_used = $mem[5];	
+			chomp (my $cputime = `tail -12 $file | grep -i "CPU Time Used" | awk '{print \$4}'`);
+			my ($cpu_hours, $cpu_mins, $cpu_secs, $cputime_mins) = 0;
+			my @jobFS = split(' ', `tail -12 $file | grep -i "JobFS"`);
+			my $jobFS_req = $jobFS[2];
+			my $jobFS_used = $jobFS[5];		
+			my $cpu_e = 0; 
+			if ($cpus!~m/unknown/) {  #not sure if this 'unknown' report ever happens on Gadi like it does on Artemis...
+				$cpus = ceil($cpus); 
+				($cpu_hours, $cpu_mins, $cpu_secs) = split('\:', $cputime); 
+				$cputime_mins = sprintf("%.2f",(($cpu_hours*60) + $cpu_mins + ($cpu_secs/60)));
+				$cpu_e = sprintf("%.2f",($cputime_mins/$walltime_mins/$cpus));
+			} 
+			chomp (my $SUs = `tail -12 $file | grep -i "Service Units" | awk '{print \$3}'`);
+			chomp (my $exit_status = `tail -12 $file | grep -i "Exit Status" | cut -d ":" -f2 | awk '{\$1=\$1};1' | awk '{print \$1}'`);
+			chomp (my $date = `tail -12 $file | grep -i "Resource Usage on" | awk '{print \$4}'`);
+			chomp (my $time = `tail -12 $file | grep -i "Resource Usage on" | awk '{print \$5}' | sed 's/:\$//'`);
+			print "$name\t$exit_status\t$SUs\t$cpu_e\t$cpus\t$mem_req\t$mem_used\t$cputime_mins\t$walltime_req\t$walltime_mins\t$jobFS_req\t$jobFS_used\t$date\n";
+		}
+		else{
+			push(@no_report, $file);
+		}
+	}
+}
+if (@no_report){
+	print "\n\n######\nWARNING: Usage metrics were not reported for: @no_report\n######\n\n";
+}
diff --git a/Scripts/gadi_nextflow_usage_v1.1.sh b/Scripts/gadi_nextflow_usage_v1.1.sh
@@ -0,0 +1,104 @@
+#!/bin/bash
+
+module load nextflow
+
+RUN_NAME="$1"
+WORKDIR="${2:-work}" # optional positional command line argument, default is './work'
+
+if [ -z "$RUN_NAME" ]; then
+    echo "No run name supplied. Exiting."
+    exit 1
+fi
+
+OUTPUT="resource_usage.${RUN_NAME}.log"
+TMPOUT="${OUTPUT}.tmp"
+
+if [ -f "$TMPOUT" ]; then
+    echo "Temp file ${TMPOUT} already exists. Refusing to run."
+    exit 1
+fi
+
+if [ ! -d "$WORKDIR" ]; then
+    echo "Cannot find work directory $WORKDIR. Exiting."
+    exit 1
+fi
+
+nextflow log -f hash,name "$RUN_NAME" > "$TMPOUT"
+
+if [[ ! -s "$TMPOUT" ]]; then
+    echo "ERROR: run name $RUN_NAME not found in this directory" >&2
+    rm -f "$TMPOUT"
+    exit 1
+fi
+
+echo -e "Job_name\tHash\tLog_path\tExit_status\tService_units\tNCPUs_requested\tCPU_time_used(mins)\tCPU_efficiency\tMemory_requested\tMemory_used\tWalltime_requested\tWalltime_used(mins)\tJobFS_requested\tJobFS_used" > "$OUTPUT"
+
+while read -r HASH JOBNAME; do
+    LOG=$(find "$WORKDIR" -type f -path "*/${HASH}*" -name ".command.log" | head -n 1)
+
+    if [[ -z "$LOG" ]]; then
+        continue
+    fi
+
+    awk -v OFS="\t" -v logfile="$LOG" -v hash="$HASH" -v jobname="$JOBNAME" '
+    function time_to_mins(t, a, n, h, m, s, total_secs) {
+        n = split(t, a, ":")
+        if (n != 3) return "NA"
+        h = a[1] + 0
+        m = a[2] + 0
+        s = a[3] + 0
+        total_secs = (h * 3600) + (m * 60) + s
+        return total_secs / 60
+    }
+
+    BEGIN {
+        exit_status = "NA"
+        service_units = "NA"
+        ncpus_requested = "NA"
+        cpu_time_used = "NA"
+        cpu_time_used_mins = "NA"
+        cpu_efficiency = "NA"
+        memory_requested = "NA"
+        memory_used = "NA"
+        walltime_requested = "NA"
+        walltime_used = "NA"
+        walltime_used_mins = "NA"
+        jobfs_requested = "NA"
+        jobfs_used = "NA"
+    }
+
+    /^=+$/ {flag1=1; next}
+    flag1 && ! /Resource Usage/ {flag1=0; next}
+    flag1 && /Resource Usage/ {flag2=1; next}
+
+    flag2 {
+        if ($0 ~ /Exit Status/)        exit_status = $3
+        if ($0 ~ /Service Units/)      service_units = $3
+        if ($0 ~ /NCPUs Requested/)    ncpus_requested = $3
+        if ($0 ~ /CPU Time Used/)      cpu_time_used = $7
+        if ($0 ~ /Memory Requested/)   memory_requested = $3
+        if ($0 ~ /Memory Used/)        memory_used = $6
+        if ($0 ~ /Walltime Requested/) walltime_requested = $3
+        if ($0 ~ /Walltime Used/)      walltime_used = $6
+        if ($0 ~ /JobFS Requested/)    jobfs_requested = $3
+        if ($0 ~ /JobFS Used/)         jobfs_used = $6
+    }
+
+    END {
+        if (cpu_time_used != "NA")
+            cpu_time_used_mins = sprintf("%.2f", time_to_mins(cpu_time_used))
+
+        if (walltime_used != "NA")
+            walltime_used_mins = sprintf("%.2f", time_to_mins(walltime_used))
+
+        if (cpu_time_used != "NA" && walltime_used != "NA" && ncpus_requested != "NA" && ncpus_requested > 0) {
+            cpu_efficiency = time_to_mins(cpu_time_used) / time_to_mins(walltime_used) / ncpus_requested
+            cpu_efficiency = sprintf("%.4f", cpu_efficiency)
+        }
+
+        print jobname, hash, logfile, exit_status, service_units, ncpus_requested, cpu_time_used_mins, cpu_efficiency, memory_requested, memory_used, walltime_requested, walltime_used_mins, jobfs_requested, jobfs_used
+    }' "$LOG"
+
+done < "$TMPOUT" >> "$OUTPUT"
+
+rm "$TMPOUT"
diff --git a/Scripts/gadi_nfcore_report.sh b/Scripts/gadi_nfcore_report.sh
diff --git a/Scripts/gadi_queuetime_report.pl b/Scripts/gadi_queuetime_report.pl
@@ -85,7 +85,7 @@
 use POSIX;
 use Time::Local; 
 
-my $dir=`pwd`;
+my $dir='.';
 chomp $dir; 
 
 my $prefix = '';