CNN Visualizer - Testing and Acceptance Criteria

1. Purpose

This document defines the quality strategy for the repository as it exists today:

a working browser inference baseline,
an improved model and training/export pipeline,
an upcoming Cloudflare deployment milestone.

2. Current Testing Reality

The current repo relies mainly on:

manual validation,
build verification,
inspection of generated training artifacts.

There is not yet a committed automated test suite or lint pipeline in the root project scripts, so the acceptance criteria below are written to match the actual stage of the codebase.

3. Scope Under Test

Current Implemented Scope

drawing input and reset behavior
preprocessing pipeline (280x280 -> 28x28)
model loading and warmup
browser prediction flow
top-class and confidence-bar rendering
training artifact generation
TF.js artifact regeneration

Upcoming Scope

Cloudflare Pages deployment through GitHub Actions

Future Scope

advanced visualization modules
intermediate activation UI
playback controls

4. Manual Runtime Checks

Minimum checks for the current browser app:

Open the app.
Confirm the model reaches the ready state.
Draw a clear 0, 1, 7, 8, and 9.
Click Predict for each case.
Confirm:
- the preview grid updates,
- the top class updates,
- the confidence bars update.
Click Clear.
Confirm canvas, preview, and prediction state reset.

5. Preprocessing Validation

The current preprocessing pipeline should be checked for:

deterministic output for repeated identical input,
resilience to thin strokes,
resilience to small gaps in a stroke,
sensible centering inside the 28x28 result,
non-crashing behavior for an empty canvas.

Current note:

empty canvas currently becomes an all-zero matrix rather than a special no-input state.

6. Model Integration Validation

The current model integration passes when:

loadModel() succeeds from /model/model.json,
warmup finishes without user-visible errors,
prediction returns exactly 10 confidences,
the top class matches the highest-confidence output,
repeated predictions do not visibly degrade the app.

7. Training Pipeline Validation

The model-improvement phase passes when:

training/python/train_cnn.py runs from the documented uv workflow,
training/python/artifacts/training-summary.json is generated,
training/python/artifacts/cnn-weights.json is generated,
training/export-python-model.js regenerates public/model/*,
the regenerated browser artifacts still load in the frontend.

8. Recorded Quality Indicators

The current committed training summary reports approximately:

best validation accuracy: 0.9905
test accuracy: 0.9910

These values are not a substitute for browser validation, but they are part of the acceptance evidence for the Phase 2 model-improvement work.

9. Build Validation

Before any deploy work is considered ready:

run npm ci
run npm run build
confirm dist/model/model.json exists
confirm dist/model/group1-shard1of1.bin exists
open the local production preview if needed

10. Cloudflare Deploy Acceptance

For the upcoming deployment phase, acceptance requires:

GitHub Actions workflow exists,
workflow completes successfully on main,
Cloudflare serves the app without 404s for model assets,
draw -> predict works in production,
workflow rerun produces a safe redeploy.

11. Recommended Future Automated Coverage

When automated tests are added, prioritize:

grayscale conversion and normalization math,
bounding-box and centering logic,
tensor shape construction,
probability ranking logic,
model asset existence checks in production builds.

12. Defect Severity Policy

P0: app unusable or production deploy broken
P1: core prediction flow incorrect or unstable
P2: secondary issue with workaround
P3: cosmetic or low-impact UX issue

Release rule:

P0 and P1 must be resolved before production rollout.

13. Current Release Gates

The current repo is ready for the next milestone only when:

browser baseline checks pass,
model artifacts are valid and load correctly,
training/export flow remains reproducible,
deployment docs and implementation agree on Cloudflare Pages.

14. Future Testing Boundary

When advanced visualization work begins, a separate acceptance layer should be added for:

activation extraction,
stage synchronization,
visualization correctness,
playback controls.

Those checks are intentionally not treated as current baseline acceptance criteria, because those features are not implemented yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNN Visualizer - Testing and Acceptance Criteria

1. Purpose

2. Current Testing Reality

3. Scope Under Test

Current Implemented Scope

Upcoming Scope

Future Scope

4. Manual Runtime Checks

5. Preprocessing Validation

6. Model Integration Validation

7. Training Pipeline Validation

8. Recorded Quality Indicators

9. Build Validation

10. Cloudflare Deploy Acceptance

11. Recommended Future Automated Coverage

12. Defect Severity Policy

13. Current Release Gates

14. Future Testing Boundary

FilesExpand file tree

08-testing-and-acceptance-criteria.md

Latest commit

History

08-testing-and-acceptance-criteria.md

File metadata and controls

CNN Visualizer - Testing and Acceptance Criteria

1. Purpose

2. Current Testing Reality

3. Scope Under Test

Current Implemented Scope

Upcoming Scope

Future Scope

4. Manual Runtime Checks

5. Preprocessing Validation

6. Model Integration Validation

7. Training Pipeline Validation

8. Recorded Quality Indicators

9. Build Validation

10. Cloudflare Deploy Acceptance

11. Recommended Future Automated Coverage

12. Defect Severity Policy

13. Current Release Gates

14. Future Testing Boundary