danieeeld2 · danieeeld2 · Apr 22, 2025 · Apr 22, 2025 · Apr 22, 2025 · Apr 22, 2025
diff --git a/README.md b/README.md
@@ -8,10 +8,14 @@
   - [3. 📁 Repository Structure](#3--repository-structure)
     - [3.1. Tests Directory Structure](#31-tests-directory-structure)
   - [4. 👷🏻 GitHub Workflows](#4--github-workflows)
+  - [5. 🐳 Docker Image](#5--docker-image)
   - [6. 📝 Functions Description](#6--functions-description)
   - [7. Example Usage](#7-example-usage)
     - [7.1 MATLAB/Octave](#71-matlaboctave)
     - [7.2. R](#72-r)
+  - [8. 📈 Performance Comparison](#8--performance-comparison)
+    - [8.1. `parglmVS` Performance](#81-parglmvs-performance)
+    - [8.2. `vasca` Performance](#82-vasca-performance) 
 
 ---
 
@@ -63,6 +67,8 @@ The `matlab/` folder contains the implementation of all functions involved in th
 ├── loadings_runners/
 ├── loadings_test.go
 ├── loadings_test_results/
+├── parglmVS_benchmark/
+├── parglmVS_benchmark.sh
 ├── parglmVS_runners/
 ├── parglmVS_test.go
 ├── pcaEig_runners/
@@ -78,6 +84,8 @@ The `matlab/` folder contains the implementation of all functions involved in th
 ├── scores_runners/
 ├── scores_test.go
 ├── scores_test_results/
+├── vasca_benchmark/
+├── vasca_benchmark.sh
 ├── vasca_runners/
 └── vasca_test.go
 ```
@@ -88,6 +96,8 @@ In general, all functions return structures with numerical data that can be comp
 
 ⚠️⚠️ *`parglmVS` is the function responsible for calculating all the data structures in the permutation test. These permutations are random, and for the test, we require a large number of permutations due to the fact that the random number generators in both languages are different. This results in the test taking a considerable amount of time, as discussed in the issues [#51](https://github.com/danieeeld2/vASCA-R/issues/51) and [#52](https://github.com/danieeeld2/vASCA-R/issues/52), as well as in the PR [#30](https://github.com/danieeeld2/vASCA-R/pull/30). At the end of this PR, you can find a screenshot showing the execution time of this test and confirming that the function passes the tests. If you want to reactivate it, simply go to the `parglmVS_test.go` code and comment the two lines that contain `t.Skip`.* 
 
+In addition to the tests, we provide the scripts `vasca_benchmark.sh` and `parglmVS_benchmark.sh`, which generate comparative performance plots of the R and MATLAB/Octave implementations of these functions. The results are stored in the `<function>_benchmark/` folder. You can find more information about this benchmark in [Section 8](#8--performance-comparison)
+
 ## 4. 👷🏻 GitHub Workflows
 
 The repository includes two GitHub workflows: `docker.yml` and `test.yml`.
@@ -141,7 +151,7 @@ With this, you have a ready-to-use environment with all the code, languages, and
 
 ## 7. Example Usage
 
-We will provide an example of an execution pipeline in each language, using the Docker image `danieeeld2/r-vasca-testing:latest` (or `danieeeld2/r-vasca-testing:r-dependencies-installed` if you want to have already installed R dependencies) and running everything from the root directory of the project. Start by running the image with a volume that includes the project, as instructed in [Section 5](#5-🐳-docker-image).
+We will provide an example of an execution pipeline in each language, using the Docker image `danieeeld2/r-vasca-testing:latest` (or `danieeeld2/r-vasca-testing:r-dependencies-installed` if you want to have already installed R dependencies) and running everything from the root directory of the project. Start by running the image with a volume that includes the project, as instructed in [Section 5](#5--docker-image).
 
 ```bash
 docker run -it --rm -v "$(pwd):/app" -w /app danieeeld2/r-vasca-testing:latest /bin/bash
@@ -272,3 +282,65 @@ for (i in seq_len(vascao$nFactors)) {
   }
 }
 ```
+
+## 8. 📈 Performance Comparison
+
+In this repository, we have implemented several functions to carry out the VASCA pipeline in R. However, to evaluate the performance of the R implementation compared to its MATLAB/Octave counterpart, we focus on the two main scripts that form the core of the pipeline: `parglmVS` and `vasca`.
+
+### 8.1. `parglmVS` Performance
+
+The script `tests/parglmVS_benchmark.sh` automates the benchmarking process for the `parglmVS` function implemented in both R and MATLAB/Octave. It runs the function across multiple models (`linear`, `interaction`, and `full`) and a range of permutation values, measuring execution times for each configuration. The results are saved in a CSV file and visualized using several comparative plots, which are generated automatically with R and stored in the `parglmVS_benchmark/` folder. *The datasets used are `X_DATA="../datasets/tests_datasets/X_test.csv"` and `F_DATA="../datasets/tests_datasets/F_test.csv"`*.
+
+<p align="center">
+  <img src="./tests/parglmVS_benchmark/benchmark_comparison_all.png" alt="Comparison plot" width="48%"/>
+  <img src="./tests/parglmVS_benchmark/benchmark_comparison_logscale.png" alt="Log-scale comparison plot" width="48%"/>
+</p>
+
+<p align="center"><i>
+Comparison of execution times between R and Octave for the <code>parglmVS</code> function. The left plot shows the raw execution times across different models and number of permutations, while the right plot displays the same results using a log-log scale for better visualization of performance differences at large scales.
+</i></p>
+
+<p align="center">
+  <img src="./tests/parglmVS_benchmark/benchmark_linear.png" alt="Linear model benchmark" width="32%"/>
+  <img src="./tests/parglmVS_benchmark/benchmark_interaction.png" alt="Interaction model benchmark" width="32%"/>
+  <img src="./tests/parglmVS_benchmark/benchmark_full.png" alt="Full model benchmark" width="32%"/>
+</p>
+
+<p align="center"><i>
+Execution time comparisons for the <code>parglmVS</code> function between R and Octave, separated by model type. Each plot shows how performance varies with the number of permutations for the linear, interaction, and full models, respectively.
+</i></p>
+
+<p align="center">
+  <img src="./tests/parglmVS_benchmark/benchmark_models_by_permutations.png" alt="Bar chart by model and permutations" width="75%"/>
+</p>
+
+<p align="center"><i>
+Bar chart summarizing the execution times of the <code>parglmVS</code> function across different models and permutation counts, grouped by language (R vs. Octave). This visualization helps highlight relative performance differences depending on model complexity and computational load.
+</i></p>
+
+As observed in the benchmark plots, the R implementation significantly outperforms the Octave version when computing the test structures for permutation testing across all three model types. This performance gap becomes increasingly pronounced as the number of permutations grows, highlighting the efficiency of the R-based approach in handling larger computational loads.
+
+### 8.2. `vasca` Performance
+
+The script `tests/vasca_benchmark.sh` automates the benchmarking process for the `vasca` function in both R and MATLAB/Octave. It evaluates performance across multiple datasets and two significance levels (`0.01` and `0.05`), measuring execution times for each configuration. The results are compiled into a CSV file and visualized through a variety of comparative plots, automatically generated with R and saved in the `vasca_benchmark/` directory. *The datasets used for this benchmark are four `.json` files located under `../datasets/tests_datasets/`, named `parglmVS_1.json` to `parglmVS_4.json`.*
+
+<p align="center">
+  <img src="./tests/vasca_benchmark/vasca_language_comparison.png" width="49%">
+  <img src="./tests/vasca_benchmark/vasca_heatmap.png" width="49%">
+</p>
+
+<p align="center">
+  <em>The first image compares the execution time of the <code>vasca</code> function between R and Octave across different datasets and significance levels. Each point represents an individual execution time, with lines connecting the results for each language. The second image shows a heatmap visualizing the execution times of the <code>vasca</code> function, categorized by language and dataset, at various significance levels.</em>
+</p>
+
+<p align="center">
+  <img src="./tests/vasca_benchmark/vasca_comparison_all.png" width="75%">
+</p>
+
+<p align="center">
+  <em>Comparison of the execution time of the <code>vasca</code> function in R and Octave across four datasets, with bars grouped by significance level (0.01 and 0.05).</em>
+</p>
+
+Although the performance of Octave/MATLAB is better in this case, the `vasca` function requires significantly less computational time compared to `parglmVS`. The performance differences between Octave and R are not as pronounced in this case, but when considering the entire pipeline, R will offer better overall performance. This is because the `parglmVS` function, which is the most computationally intensive part of the pipeline, is much more optimized in R, resulting in a faster overall execution when running the entire workflow in R.
+
+
diff --git a/tests/parglmVS_benchmark.sh b/tests/parglmVS_benchmark.sh
@@ -0,0 +1,167 @@
+#!/bin/bash
+# Parameter configuration
+MODELS=("linear" "interaction" "full")
+PERMS=(100 500 1000 2000 5000 10000 20000 50000 100000)
+BENCHMARK_DIR="parglmVS_benchmark"
+OUTPUT_FILE="$BENCHMARK_DIR/benchmark_results.csv"
+R_SCRIPT="./parglmVS_runners/parglmVS_run.R"
+OCTAVE_SCRIPT="./parglmVS_runners/parglmVS_run.m"
+X_DATA="../datasets/tests_datasets/X_test.csv"
+F_DATA="../datasets/tests_datasets/F_test.csv"
+
+# Create benchmark directory if it doesn't exist
+mkdir -p "$BENCHMARK_DIR"
+
+# Verify that files exist
+if [ ! -f "$R_SCRIPT" ]; then
+    echo "Error: $R_SCRIPT not found"
+    exit 1
+fi
+
+if [ ! -f "$OCTAVE_SCRIPT" ]; then
+    echo "Error: $OCTAVE_SCRIPT not found"
+    exit 1
+fi
+
+if [ ! -f "$X_DATA" ]; then
+    echo "Error: $X_DATA not found"
+    exit 1
+fi
+
+if [ ! -f "$F_DATA" ]; then
+    echo "Error: $F_DATA not found"
+    exit 1
+fi
+
+# Create results file with header
+echo "Language,Model,Permutations,Time(s)" > $OUTPUT_FILE
+
+# Benchmark for R
+for model in "${MODELS[@]}"; do
+    for perm in "${PERMS[@]}"; do
+        echo "Running R with Model=$model, Permutations=$perm..."
+        TIME=$( { time Rscript $R_SCRIPT $X_DATA $F_DATA Model $model Permutations $perm >/dev/null 2>&1; } 2>&1 | grep real | awk '{print $2}' | sed 's/m/*60+/g' | sed 's/s//g' | bc)
+        echo "R,$model,$perm,$TIME" >> $OUTPUT_FILE
+        # Remove any CSV files generated by R script
+        find . -name "parglmVS_*.csv" -type f -delete
+    done
+done
+
+# Benchmark for Octave (MATLAB compatible)
+for model in "${MODELS[@]}"; do
+    for perm in "${PERMS[@]}"; do
+        echo "Running Octave with Model=$model, Permutations=$perm..."
+        TIME=$( { time octave --no-gui -q $OCTAVE_SCRIPT $X_DATA $F_DATA Model $model Permutations $perm >/dev/null 2>&1; } 2>&1 | grep real | awk '{print $2}' | sed 's/m/*60+/g' | sed 's/s//g' | bc)
+        echo "Octave,$model,$perm,$TIME" >> $OUTPUT_FILE
+        # Remove any CSV files generated by Octave script
+        find . -name "parglmVS_*.csv" -type f -delete
+    done
+done
+
+echo "Benchmark completed. Results in $OUTPUT_FILE"
+
+# Generate comparative plots with R
+cat > $BENCHMARK_DIR/plot_benchmarks.R << EOF
+library(ggplot2)
+
+# Read the data
+data <- read.csv("$OUTPUT_FILE")
+
+# Convert time to numeric if not already
+data\$Time <- as.numeric(data\$Time.s.)
+
+# Define a white background theme with grid lines
+white_theme <- theme_bw() +
+  theme(
+    panel.background = element_rect(fill = "white"),
+    plot.background = element_rect(fill = "white"),
+    legend.background = element_rect(fill = "white"),
+    legend.key = element_rect(fill = "white"),
+    panel.grid.major = element_line(color = "grey90"),
+    panel.grid.minor = element_line(color = "grey95"),
+    axis.line = element_line(color = "black"),
+    text = element_text(color = "black"),
+    axis.text = element_text(color = "black"),
+    plot.title = element_text(face = "bold", size = 14),
+    legend.position = "right"
+  )
+
+# Create a combined plot for all models
+p1 <- ggplot(data, aes(x=Permutations, y=Time, color=Language, shape=Model)) +
+  geom_point(size=3) +
+  geom_line(aes(linetype=Model)) +
+  labs(title="Execution Time Comparison between R and Octave",
+       x="Number of permutations",
+       y="Execution time (seconds)") +
+  scale_color_brewer(palette="Set1") +
+  scale_x_log10(breaks = unique(data\$Permutations), 
+                labels = scales::comma(unique(data\$Permutations))) +
+  white_theme
+
+# Save the combined plot
+ggsave("$BENCHMARK_DIR/benchmark_comparison_all.png", p1, width=12, height=8, bg="white")
+
+# Create separate plots for each model
+for (model_name in unique(data\$Model)) {
+  subset_data <- data[data\$Model == model_name,]
+  p2 <- ggplot(subset_data, aes(x=Permutations, y=Time, color=Language)) +
+    geom_point(size=3) +
+    geom_line(linewidth=1) +
+    labs(title=paste("Model:", model_name),
+         x="Number of permutations",
+         y="Execution time (seconds)") +
+    scale_color_brewer(palette="Set1") +
+    scale_x_log10(breaks = unique(subset_data\$Permutations), 
+                  labels = scales::comma(unique(subset_data\$Permutations))) +
+    white_theme
+
+  ggsave(paste0("$BENCHMARK_DIR/benchmark_", model_name, ".png"), p2, width=10, height=6, bg="white")
+}
+
+# Create bar chart to compare models grouped by language
+p3 <- ggplot(data, aes(x=Model, y=Time, fill=Language)) +
+  geom_bar(stat="identity", position="dodge") +
+  facet_wrap(~Permutations, scales="free_y") +
+  labs(title="Model comparison by number of permutations",
+       x="Model",
+       y="Execution time (seconds)") +
+  scale_fill_brewer(palette="Set1") +
+  theme_bw() + 
+  theme(
+    panel.background = element_rect(fill = "white"),
+    plot.background = element_rect(fill = "white"),
+    legend.background = element_rect(fill = "white"),
+    strip.background = element_rect(fill = "lightgrey"),
+    axis.text.x = element_text(angle = 45, hjust = 1),
+    plot.title = element_text(face = "bold", size = 14)
+  )
+
+ggsave("$BENCHMARK_DIR/benchmark_models_by_permutations.png", p3, width=14, height=10, bg="white")
+
+# Create a log-scale plot to better visualize performance across all permutation ranges
+p4 <- ggplot(data, aes(x=Permutations, y=Time, color=Language, shape=Model)) +
+  geom_point(size=3) +
+  geom_line(aes(linetype=Model)) +
+  labs(title="Execution Time Comparison (Log-Log Scale)",
+       x="Number of permutations (log scale)",
+       y="Execution time (seconds, log scale)") +
+  scale_color_brewer(palette="Set1") +
+  scale_x_log10(breaks = unique(data\$Permutations), 
+                labels = scales::comma(unique(data\$Permutations))) +
+  scale_y_log10() +
+  white_theme
+
+ggsave("$BENCHMARK_DIR/benchmark_comparison_logscale.png", p4, width=12, height=8, bg="white")
+EOF
+
+# Run R script to generate plots
+echo "Generating comparative plots..."
+Rscript $BENCHMARK_DIR/plot_benchmarks.R
+
+echo "Analysis complete. The following plots have been generated in $BENCHMARK_DIR:"
+echo "- benchmark_comparison_all.png (General comparison)"
+echo "- benchmark_comparison_logscale.png (Log-scale comparison)"
+echo "- benchmark_linear.png (Linear model)"
+echo "- benchmark_interaction.png (Interaction model)"
+echo "- benchmark_full.png (Full model)"
+echo "- benchmark_models_by_permutations.png (Comparison by permutations)"
diff --git a/tests/parglmVS_benchmark/benchmark_comparison_all.png b/tests/parglmVS_benchmark/benchmark_comparison_all.png
diff --git a/tests/parglmVS_benchmark/benchmark_comparison_logscale.png b/tests/parglmVS_benchmark/benchmark_comparison_logscale.png
diff --git a/tests/parglmVS_benchmark/benchmark_full.png b/tests/parglmVS_benchmark/benchmark_full.png
diff --git a/tests/parglmVS_benchmark/benchmark_interaction.png b/tests/parglmVS_benchmark/benchmark_interaction.png
diff --git a/tests/parglmVS_benchmark/benchmark_linear.png b/tests/parglmVS_benchmark/benchmark_linear.png
diff --git a/tests/parglmVS_benchmark/benchmark_models_by_permutations.png b/tests/parglmVS_benchmark/benchmark_models_by_permutations.png
diff --git a/tests/parglmVS_benchmark/benchmark_results.csv b/tests/parglmVS_benchmark/benchmark_results.csv
@@ -0,0 +1,55 @@
+Language,Model,Permutations,Time(s)
+R,linear,100,.653
+R,linear,500,.846
+R,linear,1000,.990
+R,linear,2000,1.319
+R,linear,5000,2.398
+R,linear,10000,4.096
+R,linear,20000,7.695
+R,linear,50000,18.094
+R,linear,100000,35.774
+R,interaction,100,.622
+R,interaction,500,.826
+R,interaction,1000,1.060
+R,interaction,2000,1.449
+R,interaction,5000,2.737
+R,interaction,10000,4.784
+R,interaction,20000,9.156
+R,interaction,50000,21.228
+R,interaction,100000,41.743
+R,full,100,.615
+R,full,500,.821
+R,full,1000,1.025
+R,full,2000,1.470
+R,full,5000,2.731
+R,full,10000,4.787
+R,full,20000,8.833
+R,full,50000,21.174
+R,full,100000,41.118
+Octave,linear,100,.335
+Octave,linear,500,.682
+Octave,linear,1000,1.131
+Octave,linear,2000,2.025
+Octave,linear,5000,4.687
+Octave,linear,10000,9.057
+Octave,linear,20000,18.545
+Octave,linear,50000,47.469
+Octave,linear,100000,93.257
+Octave,interaction,100,.360
+Octave,interaction,500,.842
+Octave,interaction,1000,1.458
+Octave,interaction,2000,2.668
+Octave,interaction,5000,6.297
+Octave,interaction,10000,12.339
+Octave,interaction,20000,24.857
+Octave,interaction,50000,61.538
+Octave,interaction,100000,122.224
+Octave,full,100,.357
+Octave,full,500,.849
+Octave,full,1000,1.453
+Octave,full,2000,2.726
+Octave,full,5000,6.406
+Octave,full,10000,12.743
+Octave,full,20000,23.733
+Octave,full,50000,60.246
+Octave,full,100000,123.488