Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 73 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,14 @@
- [3. 📁 Repository Structure](#3--repository-structure)
- [3.1. Tests Directory Structure](#31-tests-directory-structure)
- [4. 👷🏻 GitHub Workflows](#4--github-workflows)
- [5. 🐳 Docker Image](#5--docker-image)
- [6. 📝 Functions Description](#6--functions-description)
- [7. Example Usage](#7-example-usage)
- [7.1 MATLAB/Octave](#71-matlaboctave)
- [7.2. R](#72-r)
- [8. 📈 Performance Comparison](#8--performance-comparison)
- [8.1. `parglmVS` Performance](#81-parglmvs-performance)
- [8.2. `vasca` Performance](#82-vasca-performance)

---

Expand Down Expand Up @@ -63,6 +67,8 @@ The `matlab/` folder contains the implementation of all functions involved in th
├── loadings_runners/
├── loadings_test.go
├── loadings_test_results/
├── parglmVS_benchmark/
├── parglmVS_benchmark.sh
├── parglmVS_runners/
├── parglmVS_test.go
├── pcaEig_runners/
Expand All @@ -78,6 +84,8 @@ The `matlab/` folder contains the implementation of all functions involved in th
├── scores_runners/
├── scores_test.go
├── scores_test_results/
├── vasca_benchmark/
├── vasca_benchmark.sh
├── vasca_runners/
└── vasca_test.go
```
Expand All @@ -88,6 +96,8 @@ In general, all functions return structures with numerical data that can be comp

⚠️⚠️ *`parglmVS` is the function responsible for calculating all the data structures in the permutation test. These permutations are random, and for the test, we require a large number of permutations due to the fact that the random number generators in both languages are different. This results in the test taking a considerable amount of time, as discussed in the issues [#51](https://github.com/danieeeld2/vASCA-R/issues/51) and [#52](https://github.com/danieeeld2/vASCA-R/issues/52), as well as in the PR [#30](https://github.com/danieeeld2/vASCA-R/pull/30). At the end of this PR, you can find a screenshot showing the execution time of this test and confirming that the function passes the tests. If you want to reactivate it, simply go to the `parglmVS_test.go` code and comment the two lines that contain `t.Skip`.*

In addition to the tests, we provide the scripts `vasca_benchmark.sh` and `parglmVS_benchmark.sh`, which generate comparative performance plots of the R and MATLAB/Octave implementations of these functions. The results are stored in the `<function>_benchmark/` folder. You can find more information about this benchmark in [Section 8](#8--performance-comparison)

## 4. 👷🏻 GitHub Workflows

The repository includes two GitHub workflows: `docker.yml` and `test.yml`.
Expand Down Expand Up @@ -141,7 +151,7 @@ With this, you have a ready-to-use environment with all the code, languages, and

## 7. Example Usage

We will provide an example of an execution pipeline in each language, using the Docker image `danieeeld2/r-vasca-testing:latest` (or `danieeeld2/r-vasca-testing:r-dependencies-installed` if you want to have already installed R dependencies) and running everything from the root directory of the project. Start by running the image with a volume that includes the project, as instructed in [Section 5](#5-🐳-docker-image).
We will provide an example of an execution pipeline in each language, using the Docker image `danieeeld2/r-vasca-testing:latest` (or `danieeeld2/r-vasca-testing:r-dependencies-installed` if you want to have already installed R dependencies) and running everything from the root directory of the project. Start by running the image with a volume that includes the project, as instructed in [Section 5](#5--docker-image).

```bash
docker run -it --rm -v "$(pwd):/app" -w /app danieeeld2/r-vasca-testing:latest /bin/bash
Expand Down Expand Up @@ -272,3 +282,65 @@ for (i in seq_len(vascao$nFactors)) {
}
}
```

## 8. 📈 Performance Comparison

In this repository, we have implemented several functions to carry out the VASCA pipeline in R. However, to evaluate the performance of the R implementation compared to its MATLAB/Octave counterpart, we focus on the two main scripts that form the core of the pipeline: `parglmVS` and `vasca`.

### 8.1. `parglmVS` Performance

The script `tests/parglmVS_benchmark.sh` automates the benchmarking process for the `parglmVS` function implemented in both R and MATLAB/Octave. It runs the function across multiple models (`linear`, `interaction`, and `full`) and a range of permutation values, measuring execution times for each configuration. The results are saved in a CSV file and visualized using several comparative plots, which are generated automatically with R and stored in the `parglmVS_benchmark/` folder. *The datasets used are `X_DATA="../datasets/tests_datasets/X_test.csv"` and `F_DATA="../datasets/tests_datasets/F_test.csv"`*.

<p align="center">
<img src="./tests/parglmVS_benchmark/benchmark_comparison_all.png" alt="Comparison plot" width="48%"/>
<img src="./tests/parglmVS_benchmark/benchmark_comparison_logscale.png" alt="Log-scale comparison plot" width="48%"/>
</p>

<p align="center"><i>
Comparison of execution times between R and Octave for the <code>parglmVS</code> function. The left plot shows the raw execution times across different models and number of permutations, while the right plot displays the same results using a log-log scale for better visualization of performance differences at large scales.
</i></p>

<p align="center">
<img src="./tests/parglmVS_benchmark/benchmark_linear.png" alt="Linear model benchmark" width="32%"/>
<img src="./tests/parglmVS_benchmark/benchmark_interaction.png" alt="Interaction model benchmark" width="32%"/>
<img src="./tests/parglmVS_benchmark/benchmark_full.png" alt="Full model benchmark" width="32%"/>
</p>

<p align="center"><i>
Execution time comparisons for the <code>parglmVS</code> function between R and Octave, separated by model type. Each plot shows how performance varies with the number of permutations for the linear, interaction, and full models, respectively.
</i></p>

<p align="center">
<img src="./tests/parglmVS_benchmark/benchmark_models_by_permutations.png" alt="Bar chart by model and permutations" width="75%"/>
</p>

<p align="center"><i>
Bar chart summarizing the execution times of the <code>parglmVS</code> function across different models and permutation counts, grouped by language (R vs. Octave). This visualization helps highlight relative performance differences depending on model complexity and computational load.
</i></p>

As observed in the benchmark plots, the R implementation significantly outperforms the Octave version when computing the test structures for permutation testing across all three model types. This performance gap becomes increasingly pronounced as the number of permutations grows, highlighting the efficiency of the R-based approach in handling larger computational loads.

### 8.2. `vasca` Performance

The script `tests/vasca_benchmark.sh` automates the benchmarking process for the `vasca` function in both R and MATLAB/Octave. It evaluates performance across multiple datasets and two significance levels (`0.01` and `0.05`), measuring execution times for each configuration. The results are compiled into a CSV file and visualized through a variety of comparative plots, automatically generated with R and saved in the `vasca_benchmark/` directory. *The datasets used for this benchmark are four `.json` files located under `../datasets/tests_datasets/`, named `parglmVS_1.json` to `parglmVS_4.json`.*

<p align="center">
<img src="./tests/vasca_benchmark/vasca_language_comparison.png" width="49%">
<img src="./tests/vasca_benchmark/vasca_heatmap.png" width="49%">
</p>

<p align="center">
<em>The first image compares the execution time of the <code>vasca</code> function between R and Octave across different datasets and significance levels. Each point represents an individual execution time, with lines connecting the results for each language. The second image shows a heatmap visualizing the execution times of the <code>vasca</code> function, categorized by language and dataset, at various significance levels.</em>
</p>

<p align="center">
<img src="./tests/vasca_benchmark/vasca_comparison_all.png" width="75%">
</p>

<p align="center">
<em>Comparison of the execution time of the <code>vasca</code> function in R and Octave across four datasets, with bars grouped by significance level (0.01 and 0.05).</em>
</p>

Although the performance of Octave/MATLAB is better in this case, the `vasca` function requires significantly less computational time compared to `parglmVS`. The performance differences between Octave and R are not as pronounced in this case, but when considering the entire pipeline, R will offer better overall performance. This is because the `parglmVS` function, which is the most computationally intensive part of the pipeline, is much more optimized in R, resulting in a faster overall execution when running the entire workflow in R.


167 changes: 167 additions & 0 deletions tests/parglmVS_benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
#!/bin/bash
# Parameter configuration
MODELS=("linear" "interaction" "full")
PERMS=(100 500 1000 2000 5000 10000 20000 50000 100000)
BENCHMARK_DIR="parglmVS_benchmark"
OUTPUT_FILE="$BENCHMARK_DIR/benchmark_results.csv"
R_SCRIPT="./parglmVS_runners/parglmVS_run.R"
OCTAVE_SCRIPT="./parglmVS_runners/parglmVS_run.m"
X_DATA="../datasets/tests_datasets/X_test.csv"
F_DATA="../datasets/tests_datasets/F_test.csv"

# Create benchmark directory if it doesn't exist
mkdir -p "$BENCHMARK_DIR"

# Verify that files exist
if [ ! -f "$R_SCRIPT" ]; then
echo "Error: $R_SCRIPT not found"
exit 1
fi

if [ ! -f "$OCTAVE_SCRIPT" ]; then
echo "Error: $OCTAVE_SCRIPT not found"
exit 1
fi

if [ ! -f "$X_DATA" ]; then
echo "Error: $X_DATA not found"
exit 1
fi

if [ ! -f "$F_DATA" ]; then
echo "Error: $F_DATA not found"
exit 1
fi

# Create results file with header
echo "Language,Model,Permutations,Time(s)" > $OUTPUT_FILE

# Benchmark for R
for model in "${MODELS[@]}"; do
for perm in "${PERMS[@]}"; do
echo "Running R with Model=$model, Permutations=$perm..."
TIME=$( { time Rscript $R_SCRIPT $X_DATA $F_DATA Model $model Permutations $perm >/dev/null 2>&1; } 2>&1 | grep real | awk '{print $2}' | sed 's/m/*60+/g' | sed 's/s//g' | bc)
echo "R,$model,$perm,$TIME" >> $OUTPUT_FILE
# Remove any CSV files generated by R script
find . -name "parglmVS_*.csv" -type f -delete
done
done

# Benchmark for Octave (MATLAB compatible)
for model in "${MODELS[@]}"; do
for perm in "${PERMS[@]}"; do
echo "Running Octave with Model=$model, Permutations=$perm..."
TIME=$( { time octave --no-gui -q $OCTAVE_SCRIPT $X_DATA $F_DATA Model $model Permutations $perm >/dev/null 2>&1; } 2>&1 | grep real | awk '{print $2}' | sed 's/m/*60+/g' | sed 's/s//g' | bc)
echo "Octave,$model,$perm,$TIME" >> $OUTPUT_FILE
# Remove any CSV files generated by Octave script
find . -name "parglmVS_*.csv" -type f -delete
done
done

echo "Benchmark completed. Results in $OUTPUT_FILE"

# Generate comparative plots with R
cat > $BENCHMARK_DIR/plot_benchmarks.R << EOF
library(ggplot2)

# Read the data
data <- read.csv("$OUTPUT_FILE")

# Convert time to numeric if not already
data\$Time <- as.numeric(data\$Time.s.)

# Define a white background theme with grid lines
white_theme <- theme_bw() +
theme(
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "white"),
legend.background = element_rect(fill = "white"),
legend.key = element_rect(fill = "white"),
panel.grid.major = element_line(color = "grey90"),
panel.grid.minor = element_line(color = "grey95"),
axis.line = element_line(color = "black"),
text = element_text(color = "black"),
axis.text = element_text(color = "black"),
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)

# Create a combined plot for all models
p1 <- ggplot(data, aes(x=Permutations, y=Time, color=Language, shape=Model)) +
geom_point(size=3) +
geom_line(aes(linetype=Model)) +
labs(title="Execution Time Comparison between R and Octave",
x="Number of permutations",
y="Execution time (seconds)") +
scale_color_brewer(palette="Set1") +
scale_x_log10(breaks = unique(data\$Permutations),
labels = scales::comma(unique(data\$Permutations))) +
white_theme

# Save the combined plot
ggsave("$BENCHMARK_DIR/benchmark_comparison_all.png", p1, width=12, height=8, bg="white")

# Create separate plots for each model
for (model_name in unique(data\$Model)) {
subset_data <- data[data\$Model == model_name,]
p2 <- ggplot(subset_data, aes(x=Permutations, y=Time, color=Language)) +
geom_point(size=3) +
geom_line(linewidth=1) +
labs(title=paste("Model:", model_name),
x="Number of permutations",
y="Execution time (seconds)") +
scale_color_brewer(palette="Set1") +
scale_x_log10(breaks = unique(subset_data\$Permutations),
labels = scales::comma(unique(subset_data\$Permutations))) +
white_theme

ggsave(paste0("$BENCHMARK_DIR/benchmark_", model_name, ".png"), p2, width=10, height=6, bg="white")
}

# Create bar chart to compare models grouped by language
p3 <- ggplot(data, aes(x=Model, y=Time, fill=Language)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~Permutations, scales="free_y") +
labs(title="Model comparison by number of permutations",
x="Model",
y="Execution time (seconds)") +
scale_fill_brewer(palette="Set1") +
theme_bw() +
theme(
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "white"),
legend.background = element_rect(fill = "white"),
strip.background = element_rect(fill = "lightgrey"),
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold", size = 14)
)

ggsave("$BENCHMARK_DIR/benchmark_models_by_permutations.png", p3, width=14, height=10, bg="white")

# Create a log-scale plot to better visualize performance across all permutation ranges
p4 <- ggplot(data, aes(x=Permutations, y=Time, color=Language, shape=Model)) +
geom_point(size=3) +
geom_line(aes(linetype=Model)) +
labs(title="Execution Time Comparison (Log-Log Scale)",
x="Number of permutations (log scale)",
y="Execution time (seconds, log scale)") +
scale_color_brewer(palette="Set1") +
scale_x_log10(breaks = unique(data\$Permutations),
labels = scales::comma(unique(data\$Permutations))) +
scale_y_log10() +
white_theme

ggsave("$BENCHMARK_DIR/benchmark_comparison_logscale.png", p4, width=12, height=8, bg="white")
EOF

# Run R script to generate plots
echo "Generating comparative plots..."
Rscript $BENCHMARK_DIR/plot_benchmarks.R

echo "Analysis complete. The following plots have been generated in $BENCHMARK_DIR:"
echo "- benchmark_comparison_all.png (General comparison)"
echo "- benchmark_comparison_logscale.png (Log-scale comparison)"
echo "- benchmark_linear.png (Linear model)"
echo "- benchmark_interaction.png (Interaction model)"
echo "- benchmark_full.png (Full model)"
echo "- benchmark_models_by_permutations.png (Comparison by permutations)"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/parglmVS_benchmark/benchmark_full.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/parglmVS_benchmark/benchmark_linear.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 55 additions & 0 deletions tests/parglmVS_benchmark/benchmark_results.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
Language,Model,Permutations,Time(s)
R,linear,100,.653
R,linear,500,.846
R,linear,1000,.990
R,linear,2000,1.319
R,linear,5000,2.398
R,linear,10000,4.096
R,linear,20000,7.695
R,linear,50000,18.094
R,linear,100000,35.774
R,interaction,100,.622
R,interaction,500,.826
R,interaction,1000,1.060
R,interaction,2000,1.449
R,interaction,5000,2.737
R,interaction,10000,4.784
R,interaction,20000,9.156
R,interaction,50000,21.228
R,interaction,100000,41.743
R,full,100,.615
R,full,500,.821
R,full,1000,1.025
R,full,2000,1.470
R,full,5000,2.731
R,full,10000,4.787
R,full,20000,8.833
R,full,50000,21.174
R,full,100000,41.118
Octave,linear,100,.335
Octave,linear,500,.682
Octave,linear,1000,1.131
Octave,linear,2000,2.025
Octave,linear,5000,4.687
Octave,linear,10000,9.057
Octave,linear,20000,18.545
Octave,linear,50000,47.469
Octave,linear,100000,93.257
Octave,interaction,100,.360
Octave,interaction,500,.842
Octave,interaction,1000,1.458
Octave,interaction,2000,2.668
Octave,interaction,5000,6.297
Octave,interaction,10000,12.339
Octave,interaction,20000,24.857
Octave,interaction,50000,61.538
Octave,interaction,100000,122.224
Octave,full,100,.357
Octave,full,500,.849
Octave,full,1000,1.453
Octave,full,2000,2.726
Octave,full,5000,6.406
Octave,full,10000,12.743
Octave,full,20000,23.733
Octave,full,50000,60.246
Octave,full,100000,123.488
Loading