Download pre-generated tutorial data instead of generating it by bendichter · Pull Request #1057 · NeurodataWithoutBorders/nwb-guide

bendichter · 2026-02-13T05:15:23Z

Problem

Tutorial test data generation runs every E2E CI run, spending ~2-3 minutes fitting PCA across 50 units × 385 channels. This is unnecessary since the data is deterministic (seeded).

Solution

Pre-generated tutorial data is now hosted as GitHub release assets and downloaded instead of generated:

Single-session data (SpikeGLX + Phy): 36MB compressed
Multi-session dataset (2 subjects × 2 sessions): 113MB compressed

Changes

Backend: New download_test_data() and download_test_dataset() functions + /data/download and /data/download/dataset API endpoints
Frontend: App tries downloading first, falls back to generation if offline or download fails
CI: ExampleDataCache workflow caches tutorial data from GitHub release; E2E workflow restores it before tests
Release: tutorial-test-data-v1 hosts the compressed archives

Benefits

E2E tests skip data generation entirely (data already present)
App users get tutorial data in seconds instead of minutes
Generation code still works as fallback (no breaking change)
Data is versioned via release tags — bump tag when generation code changes

SpikeGLX recording data is generated locally (fast, just binary writes). Phy sorting data is downloaded from a GitHub release asset (17MB), avoiding ~2 min of PCA fitting per CI run. - Split generate_test_data into _generate_spikeglx_data (fast) + Phy (slow) - Add download_test_data: generates SpikeGLX + downloads pre-built Phy - App tries download first, falls back to full generation if offline - CI caches Phy data; E2E restores it before tests

Instead of downloading everything or generating everything: - SpikeGLX recording data is generated locally (fast, ~10s) - Phy sorting data is downloaded from GitHub release (17MB, avoids ~2min PCA) - Falls back to full generation if download fails (offline support)

for more information, see https://pre-commit.ci

rly · 2026-03-06T01:15:26Z

Tutorial generation now takes 13 seconds on my Mac M1. Do we still want to cache the tutorial data and use that in tests?

bendichter force-pushed the cache-tutorial-data branch 2 times, most recently from 43f9cdf to dbb7ebe Compare February 13, 2026 13:06

bendichter force-pushed the cache-tutorial-data branch from 3b7e94b to ceaa846 Compare February 13, 2026 13:09

bendichter force-pushed the cache-tutorial-data branch from 70d5e03 to f370f92 Compare February 13, 2026 13:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

9fe5e93

for more information, see https://pre-commit.ci

Merge branch 'main' into cache-tutorial-data

c1e1ca9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download pre-generated tutorial data instead of generating it#1057

Download pre-generated tutorial data instead of generating it#1057
bendichter wants to merge 4 commits intomainfrom
cache-tutorial-data

bendichter commented Feb 13, 2026

Uh oh!

rly commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bendichter commented Feb 13, 2026

Problem

Solution

Changes

Benefits

Uh oh!

rly commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants