Planned. Extract the entire demo generation pipeline (TTS, Manim, VHS, ffmpeg composition, validation, GitHub Pages publishing) from
docs/demos/into a standalone, pip-installable Python library atdocumentation-generator. The library provides both a Python API and adocgenCLI. Advanced validation catches visual errors (OCR), layout issues (bounding-box analysis), audio-visual desynchronization, and narration artifacts — all before assets are committed.
Build a reusable Python library (docgen) that any project can consume to:
- Generate narrated demo videos from Markdown narration scripts, Manim animations, and VHS terminal tapes.
- Compose and concatenate segments with ffmpeg.
- Validate output quality automatically — OCR for visible errors, layout overlap detection, audio-visual sync, and narration linting.
- Publish to GitHub Pages — generate
index.html, deploy workflow, and LFS tracking rules from config. - Replace all ad-hoc shell and Python scripts currently in
tekton-dag/docs/demos/.
Success = pip install docgen from the Git URL, create a docgen.yaml, and run docgen generate-all to produce validated, published demo videos.
| Capability | Current location | Replaced by |
|---|---|---|
| TTS narration (OpenAI gpt-4o-mini-tts) | docs/demos/generate-narration.py |
docgen tts / docgen.tts.TTSGenerator |
| Whisper timestamp extraction | docs/demos/extract-timestamps.py |
docgen.timestamps.TimestampExtractor |
| ffmpeg composition (audio + video) | docs/demos/compose.sh |
docgen compose / docgen.compose.Composer |
| Master pipeline (generate-all) | docs/demos/generate-all.sh |
docgen generate-all / docgen.pipeline.Pipeline |
| A/V drift validation | docs/demos/validate-composed-demos.sh |
docgen validate / docgen.validate.Validator |
| Audio rebuild helper | docs/demos/rebuild-after-audio.sh |
docgen rebuild-after-audio |
| M12.2 segment regeneration | docs/demos/regenerate-m12-2.sh |
docgen generate-all --segments 12,13,14 |
| GitHub Pages index | docs/index.html (hand-maintained) |
docgen pages (auto-generated) |
| Pages deploy workflow | .github/workflows/pages.yml (hand-maintained) |
docgen pages (auto-generated) |
| Manim scene definitions | docs/demos/animations/scenes.py |
Stays in consuming repo; invoked by docgen manim |
| VHS tape files | docs/demos/terminal/*.tape |
Stay in consuming repo; invoked by docgen vhs |
| Narration scripts | docs/demos/narration/*.md |
Stay in consuming repo; consumed by docgen tts |
/home/ubuntu/github/jmjava/documentation-generator/
├── pyproject.toml # packaging, deps, [project.scripts] docgen CLI
├── README.md
├── src/
│ └── docgen/
│ ├── __init__.py
│ ├── cli.py # CLI dispatcher (docgen tts, compose, vhs, validate, pages, ...)
│ ├── config.py # docgen.yaml loader, path resolution, schema validation
│ ├── tts.py # OpenAI TTS: markdown stripping, streaming, per-segment
│ ├── timestamps.py # Whisper timestamp extraction
│ ├── narration_lint.py # pre-TTS text lint + post-TTS transcript lint
│ ├── vhs.py # VHS runner + stderr error scanner
│ ├── manim_runner.py # Manim scene invocation
│ ├── compose.py # ffmpeg composition (Python subprocess)
│ ├── ocr.py # OpenCV + pytesseract frame OCR for Manim/VHS MP4s
│ ├── manim_layout.py # bounding-box overlap / spacing / edge checks
│ ├── av_sync.py # audio-visual sync validation (Whisper + OCR)
│ ├── validate.py # unified validator: all checks + --pre-push
│ ├── concat.py # full-demo concatenation
│ ├── pages.py # generate index.html, pages.yml, .gitattributes, .gitignore
│ ├── wizard.py # narration setup wizard (Flask web GUI)
│ ├── templates/ # Jinja2 HTML templates for wizard
│ │ └── wizard.html
│ ├── static/ # CSS + JS for wizard frontend
│ │ ├── wizard.css
│ │ └── wizard.js
│ ├── pipeline.py # orchestrate: tts → manim → vhs → compose → validate → concat → pages
│ └── config.py # project config loader (visual map, segments, paths, env)
├── tests/
│ ├── test_config.py
│ ├── test_tts.py
│ ├── test_narration_lint.py
│ ├── test_vhs.py
│ ├── test_compose.py
│ ├── test_ocr.py
│ ├── test_manim_layout.py
│ ├── test_av_sync.py
│ ├── test_validate.py
│ ├── test_pages.py
│ ├── test_wizard.py
│ └── test_pipeline.py
└── examples/
├── README.md
└── minimal-bundle/
├── narration/
├── terminal/
└── docgen.yaml
docs/demos/
├── docgen.yaml # project config (segments, visual_map, TTS, validation, pages)
├── narration/*.md # narration scripts (unchanged)
├── terminal/*.tape # VHS tapes (unchanged)
├── animations/scenes.py # Manim scenes (unchanged, project-specific)
├── animations/timing.json # generated by docgen
├── audio/*.mp3 # generated by docgen tts
├── terminal/rendered/*.mp4 # generated by docgen vhs
├── recordings/*.mp4 # generated by docgen compose + concat
└── requirements.txt # adds: docgen @ git+https://github.com/jmjava/documentation-generator.git
Deleted after migration: compose.sh, generate-all.sh, generate-narration.py, extract-timestamps.py, validate-composed-demos.sh, rebuild-after-audio.sh, regenerate-m12-2.sh. Hand-maintained docs/index.html and .github/workflows/pages.yml replaced by auto-generated versions.
Loads docgen.yaml, resolves relative paths against the yaml location, validates required keys, and provides typed defaults. All other modules receive a Config object rather than reading env vars or paths directly.
Auto-discovers docgen.yaml by walking up from cwd.
Ports generate-narration.py into a TTSGenerator class:
markdown_to_tts_plain()strips headings, emphasis markers, links, code fences, and metadata lines- Calls OpenAI
gpt-4o-mini-ttswith configurable voice and instructions - Streams audio to MP3 files under
audio/ - Supports
--segment 01(single segment) and--dry-run(show stripped text without calling API)
Ports extract-timestamps.py into a TimestampExtractor class:
- Transcribes MP3 with OpenAI Whisper API
- Extracts word-level and segment-level timestamps
- Writes
animations/timing.jsonfor Manim scene synchronization - Used by
av_sync.pyfor validation
Two-stage validation to catch generation metadata that leaks into audio:
Stage 1: pre-TTS (text lint) — runs before calling OpenAI:
- Scans stripped narration text for leaked metadata artifacts:
target duration,intended length,approximately N minutesvisual:,edit for voice, stage directions in*(...)*- Markdown syntax:
# headings,**bold**,`backticks`,[links](url) - Section markers:
---,***,## Script
- Blocks the TTS call if matches found, reports offending lines
- Configurable deny-patterns in
docgen.yaml
Stage 2: post-TTS (audio transcript lint) — runs after MP3 is generated:
- Transcribes the generated MP3 with Whisper
- Scans transcript for spoken artifact phrases:
- "target duration", "intended length", "approximately three minutes"
- "visual colon", "narration segment", "script section", "edit for voice"
- "markdown", "heading", "backtick"
- Catches TTS model hallucinations and metadata the text stripper missed
VHSRunner class:
- Renders
.tapefiles by invokingvhssubprocess - Captures stdout and stderr
- After render, scans output for error patterns:
command not found,No such file,error:,Permission denied,syntax error --strictmode: any unrecognized shell output producing unexpected stderr fails the build- Returns structured results per tape (success/fail, errors found, output path)
ManimRunner class:
- Reads scene list and quality setting from config
- Invokes
manimsubprocess on each scene in the consuming repo'sanimations/scenes.py - Validates output MP4 exists after render
- Supports
--scene StackDAGScenefor single-scene rendering
Composer class rewriting compose.sh in Python:
compose_simple(segment)— single visual + audio (Manim-only or VHS-only segments)compose_mixed(segment)— concatenate multiple visual sources, then add audio (e.g., segment 03: Manim + VHS)compose_still(segment)— loop a still image to match audio durationcompose_image(segment)— use a static image as the visual- Reads
visual_mapfrom config to determine which method per segment - Proper error handling with exceptions and structured logging
- All ffmpeg calls via
subprocess.runwith timeout
Scans rendered MP4 frames using OpenCV + pytesseract to detect visible errors:
- Sample frames at configurable intervals (default: every 2s, plus first/last frame)
- Convert to grayscale, apply thresholding for clean OCR
- Run
pytesseract.image_to_string+image_to_data(bounding boxes + confidence scores) - Aggregate detected text across sampled frames
VHS checks:
- Scan for shell error patterns:
command not found,No such file or directory,bash:,syntax error,Permission denied - Report frame timestamp and matched text
Manim checks:
- Detect garbled/garbage text (low-confidence OCR regions below configurable threshold)
- Detect text too close to frame edges
- Cross-reference extracted text against expected labels (optional allowlist in config)
Static analysis of rendered Manim frames for element spacing:
- Sample key frames from Manim MP4 (first, mid, last, plus configured beat timestamps)
- Run
pytesseract.image_to_datato get bounding boxes of all text regions - Overlap detection: flag any pair of text regions with IoU > 0
- Spacing check: flag pairs closer than
min_spacingpixels (default: 10px) - Edge clipping: flag text within
edge_marginpixels of frame border (default: 15px)
Validates that visual content matches narration timing:
- Transcribe audio with Whisper to get word/segment-level timestamps
- Extract anchor keywords from transcript — important nouns, technical terms, transition phrases
- Sample video frames at anchor timestamps from the composed MP4
- OCR each sampled frame
- Verify the anchor keyword (or synonym) appears in the OCR text within a configurable tolerance window (default: +/- 3s)
What it catches:
- Manim scene transitions that happen too early/late relative to narration
- VHS terminal segments where commands haven't appeared when the narrator describes them
- Composed videos where audio and video got swapped or badly muxed
- Scenes rebuilt with different timing after audio changed but compose wasn't re-run
Auto-extracts anchors from Whisper transcript if not configured manually.
Combines all validation checks into a single Validator class:
- A/V stream presence — both audio and video streams exist in composed MP4
- A/V drift — audio and video durations within configurable tolerance (default: 2.75s)
- Narration lint (pre-TTS) — stripped text has no leaked metadata
- Narration lint (post-TTS) — Whisper transcript of generated MP3 has no spoken artifacts
- VHS stderr scan — no shell errors in VHS build output
- OCR content scan (VHS) — no
command not foundor shell errors visible in pixels - OCR content scan (Manim) — no garbled text or low-confidence garbage regions
- Layout overlap (Manim) — no overlapping text, proper edge margins, minimum spacing
- Audio-visual sync — narration keywords visible on screen when spoken
--pre-pushmode runs everything and exits non-zero on any failure (CI/pre-push hook integration)
Reads concat map from config and builds concatenated MP4 files using ffmpeg concat demuxer. Supports multiple concat targets (e.g., full-demo.mp4 for segments 01-11, full-demo-with-m12-2.mp4 for 01-14).
Generates and maintains all assets needed for GitHub Pages deployment:
docs/index.html — responsive video gallery:
- Reads segment config: titles, descriptions, visual type
- Probes each
recordings/*.mp4with ffprobe for actual duration - Generates video cards with
<video>tags, durations, anchor IDs (#seg-01...#seg-14) - Generates concat demo cards from the
concatconfig - Footer with repo link and extra links from config
- Modern CSS (gradient background, grid layout, hover effects, mobile responsive)
- Idempotent: re-running updates durations and adds/removes segments to match config
.github/workflows/pages.yml — deploy workflow:
- Triggers on push to
mainwhendocs/**changes (+workflow_dispatch) - Checks out with LFS
- Cache-busts video
srcURLs with commit SHA query strings actions/upload-pages-artifactfromdocs/actions/deploy-pages
.gitattributes — LFS tracking rules:
docs/demos/recordings/*.mp4 filter=lfs diff=lfs merge=lfs -textdocs/demos/audio/*.mp3 filter=lfs diff=lfs merge=lfs -text- Merges with existing
.gitattributesif present (does not clobber)
docs/demos/.gitignore — intermediate object exclusions:
Generated from docgen.yaml paths so the ignore rules always match the configured directory layout:
.venv/— Python virtual environmentanimations/media/partial_movie_files/— Manim intermediate rendersanimations/media/videos/— Manim output (configurable quality subdirs)terminal/rendered/— VHS rendered outputaudio/*.mp3— TTS output (regenerable withOPENAI_API_KEY)slides/node_modules/,slides/dist/— Slidev build artifacts__pycache__/,*.pyc— Python bytecode
Recordings (recordings/*.mp4) are intentionally not ignored — they are committed (tracked via LFS) so README and Pages links work. The generated .gitignore includes a comment explaining this.
Merges with any existing .gitignore entries the consuming repo already has (does not clobber custom rules).
Pipeline class that runs the full generation workflow:
tts → manim → vhs → compose → validate → concat → pages
- Each stage is skippable via flags (
--skip-tts,--skip-manim,--skip-vhs) - Stages report structured results (success/fail, warnings, timings)
- Fails fast on validation errors unless
--continue-on-erroris set
A local web application that serves as the full production cockpit for demo video creation. Launched via docgen wizard and opens in the browser. Covers the complete lifecycle: bootstrapping narration from project docs, generating assets, proofreading results, and iterating on individual segments until each is approved.
The GUI has two main views: Setup (initial narration bootstrapping) and Production (per-segment review/redo loop).
- Scan — walks the consuming repo root and discovers all
.mdfiles (respects.gitignore). Builds a directory tree structure. - Present — renders:
- A collapsible file tree with checkboxes to include/exclude each
.mdfile as source material for narration. Directories are collapsible nodes; checking a directory checks all children. Files show a preview snippet (first ~3 lines) on hover or expand. - A guidance panel with a multi-line text input where the user provides general advice on video creation: what to emphasize, target audience, tone, technical depth, which features matter most, what to skip, etc.
- A segment mapping section where the user can drag selected files into named segments (e.g., "01-architecture", "03-bootstrap-dataflow") or let the wizard auto-group by directory/topic.
- A generate button that sends the selected content + guidance to the LLM to produce narration drafts.
- A collapsible file tree with checkboxes to include/exclude each
- Generate — for each segment, the wizard:
- Concatenates the selected
.mdfile contents in order - Sends them to an LLM (OpenAI GPT-4o or configurable) along with the user's guidance text and a system prompt that instructs the model to produce plain spoken narration (conversational prose, no markdown, written to be read aloud by TTS)
- Writes the output to
narration/<segment>.md
- Concatenates the selected
- Review — the GUI shows a preview of each generated narration draft with an inline editor for manual tweaks before saving.
After initial narration is created (via Setup or manually), the Production view provides a segment-by-segment review workflow. The user works on one segment at a time and iterates until satisfied.
Segment list panel (left sidebar):
- All segments from
docgen.yamllisted with status badges:- Draft — narration exists but no video yet
- Generated — video composed, awaiting review
- Needs work — user flagged issues during proofreading
- Approved — user marked as done
- Click a segment to load it in the main review area
- Progress bar showing how many segments are approved vs total
Main review area (per segment):
-
Narration tab:
- Full text of
narration/<segment>.mdin an editable text area - "Regenerate narration" button — re-sends to LLM with original source docs + user guidance + optional revision notes (e.g., "make the opening shorter", "emphasize the header propagation more")
- Revision notes text input for targeted LLM feedback
- Diff view showing changes from previous version (if regenerated)
- Full text of
-
Audio tab:
- Play button for the generated
audio/<segment>.mp3 - Waveform visualization (lightweight, e.g., wavesurfer.js or simple canvas)
- "Redo TTS" button — regenerates audio from current narration text
- Narration lint results shown inline (pre-TTS warnings flagged before generation, post-TTS transcript issues flagged after)
- Play button for the generated
-
Video tab:
- Embedded
<video>player forrecordings/<segment>.mp4 - "Redo compose" button — recomposes video from current audio + visuals
- Side-by-side: video player on left, validation results on right
- Validation results panel:
- OCR scan results (any errors detected in frames, with timestamps and screenshots)
- Layout check results (overlap/spacing issues with frame thumbnails)
- A/V sync check results (anchor keywords with pass/fail per timestamp)
- A/V drift (audio vs video duration)
- "Redo VHS" / "Redo Manim" buttons if the segment's visual source needs re-rendering
- Embedded
-
Actions bar (bottom):
- Redo all — re-run the full pipeline for this segment only (TTS → render → compose → validate)
- Mark as approved — sets status to Approved, moves to next unapproved segment
- Flag for rework — sets status to Needs Work with a note field
- Next / Previous segment navigation
Workflow:
For each segment:
1. Review narration text → edit or regenerate with revision notes
2. Listen to audio → redo TTS if voice/pacing is off
3. Watch composed video → check validation results
4. If issues: redo the specific step (narration, TTS, VHS/Manim, compose)
5. Repeat until satisfied → mark as Approved
6. Move to next segment
State persistence: Segment statuses (Draft / Generated / Needs Work / Approved) and revision notes are stored in a .docgen-state.json file in the demos directory. This file survives server restarts so the user can close the wizard and resume later.
- Backend: Flask with API endpoints for each operation (scan, generate narration, run TTS, compose, validate, get status)
- Frontend: Single-page app with vanilla JS or Alpine.js/htmx. Responsive layout: sidebar segment list + tabbed main area. Video player, audio player, text editors, validation result cards.
- LLM integration: Uses
OPENAI_API_KEY. Configurable model indocgen.yaml:
wizard:
llm_model: gpt-4o
system_prompt: >-
You are a technical writer creating narration scripts for demo videos.
Write in plain spoken English suitable for text-to-speech. No markdown
formatting, no headings, no bullet points. Conversational but professional
tone, like a senior engineer presenting at a conference.
default_guidance: ""
exclude_patterns:
- '**/node_modules/**'
- '**/.pytest_cache/**'
- '**/archive/**'What it produces:
narration/*.md— narration scripts (created or updated per segment)audio/*.mp3— TTS audio (regenerated on redo)recordings/*.mp4— composed videos (regenerated on redo).docgen-state.json— segment review statuses and revision notes- Optionally updates
docgen.yamlsegments list and visual_map if new segments are created
Click or argparse dispatcher providing all subcommands:
| Command | Description |
|---|---|
docgen tts [--segment 01] [--dry-run] |
Generate TTS audio from narration markdown |
docgen manim [--scene StackDAGScene] |
Render Manim animations |
docgen vhs [--tape 02-quickstart.tape] [--strict] |
Render VHS terminal recordings |
docgen compose [01 02 03] |
Compose segments (audio + video) |
docgen validate [--max-drift 2.75] [--pre-push] |
Run all validation checks |
docgen concat [--config full-demo-with-m12-2] |
Concatenate full demo files |
docgen pages [--force] |
Generate index.html, pages.yml, .gitattributes, .gitignore |
docgen wizard [--port 8501] |
Launch narration setup wizard (local web GUI) |
docgen generate-all [--skip-tts] [--skip-manim] [--skip-vhs] |
Run full pipeline |
docgen rebuild-after-audio |
Recompose + validate + concat (skip TTS/Manim/VHS) |
All commands auto-discover docgen.yaml by walking up from cwd.
segments:
default: ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11']
all: ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14']
visual_map:
'01': { type: manim, source: StackDAGScene.mp4 }
'02': { type: vhs, source: 02-quickstart.mp4 }
'03': { type: mixed, sources: [HeaderPropagationScene.mp4, 03-bootstrap.mp4] }
'04': { type: vhs, source: 04-pr-pipeline.mp4 }
'05': { type: manim, source: InterceptRoutingScene.mp4 }
'06': { type: manim, source: LocalDebugScene.mp4 }
'07': { type: vhs, source: 07-orchestrator-api.mp4 }
'08': { type: manim, source: MultiTeamScene.mp4 }
'09': { type: vhs, source: 09-results-db.mp4 }
'10': { type: vhs, source: 10-newman.mp4 }
'11': { type: mixed, sources: [BlastRadiusScene.mp4, 11-graph-tests.mp4] }
'12': { type: vhs, source: 12-regression-suite.mp4 }
'13': { type: vhs, source: 13-management-gui-architecture.mp4 }
'14': { type: vhs, source: 14-gui-tekton-extension.mp4 }
concat:
full-demo.mp4: ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11']
full-demo-with-m12-2.mp4:
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14']
tts:
model: gpt-4o-mini-tts
voice: coral
instructions: >-
You are narrating a technical demo video about a CI/CD pipeline system.
Speak in a calm, professional tone like a senior engineer giving a
conference talk. Moderate pace. Pause briefly between sentences.
Pronounce technical terms clearly: Tekton, Kubernetes, mirrord,
Neo4j, Newman, Kaniko, ArgoCD, Helm.
manim:
scenes:
- StackDAGScene
- HeaderPropagationScene
- InterceptRoutingScene
- LocalDebugScene
- MultiTeamScene
- BlastRadiusScene
quality: 720p30
validation:
max_drift_sec: 2.75
ocr:
sample_interval_sec: 2
error_patterns:
- 'command not found'
- 'No such file'
- 'syntax error'
- 'Permission denied'
- 'bash:'
min_confidence: 40
layout:
min_spacing_px: 10
edge_margin_px: 15
check_overlap: true
av_sync:
enabled: true
tolerance_sec: 3.0
min_anchors_per_segment: 2
narration_lint:
pre_tts_deny_patterns:
- 'target duration'
- 'intended length'
- 'visual:'
- 'edit for voice'
- 'approximately \d+ minutes'
post_tts_deny_patterns:
- 'target duration'
- 'narration segment'
- 'script section'
- 'edit for voice'
block_tts_on_pre_lint: true
whisper_check: true
pages:
title: "tekton-dag Demo Videos"
subtitle: "Stack-aware CI/CD with traffic interception"
repo_url: "https://github.com/jmjava/tekton-dag"
docs_dir: docs
demos_subdir: demos
extra_links:
- { label: "Milestone 8", href: "milestones/milestone-8.md" }
- { label: "Milestone 12.2", href: "milestones/milestone-12.2.md" }
segments:
'01': { title: "Architecture overview", description: "System architecture, DAG model, polyglot support, pipelines, and extensibility features." }
'02': { title: "Quick start", description: "Kind, Tekton, images, tasks (VHS terminal demo)." }
'03': { title: "Bootstrap dataflow", description: "Stack bootstrap PipelineRun walkthrough." }
'04': { title: "PR pipeline", description: "PR flow, intercepts, tests (VHS)." }
'05': { title: "Intercept routing", description: "PR vs normal traffic routing, header-based interception." }
'06': { title: "Local debugging", description: "mirrord integration, IDE breakpoints, live cluster debugging." }
'07': { title: "Orchestrator API", description: "REST API, stacks, test plan, graph (VHS)." }
'08': { title: "Multi-team Helm", description: "Helm chart deployment, team isolation, custom hooks." }
'09': { title: "Tekton Results", description: "Results API and persisted history (VHS)." }
'10': { title: "Newman / regression", description: "API tests and local test tiers (VHS)." }
'11': { title: "Test-trace graph", description: "Blast radius, graph query, focused tests (mixed)." }
'12': { title: "Regression suite (M12.2)", description: "Full regression story + agent workflows." }
'13': { title: "Management GUI architecture (M12.2)", description: "Vue, Flask, orchestrator relationship." }
'14': { title: "GUI Tekton extension (M12.2)", description: "Extending the GUI for Tekton." }
env_file: ../../.envfrom docgen import Pipeline, Config
config = Config.from_yaml("docgen.yaml")
pipeline = Pipeline(config)
pipeline.run(skip_tts=True, skip_manim=True) # just VHS + compose + validate + concat + pages
# Individual steps:
from docgen.tts import TTSGenerator
gen = TTSGenerator(config)
gen.generate(segment="01", dry_run=False)
from docgen.vhs import VHSRunner
runner = VHSRunner(config)
results = runner.render_all(strict=True) # raises on "command not found"
from docgen.validate import Validator
v = Validator(config)
report = v.run_all() # returns structured report with pass/fail per check
v.run_pre_push() # exits non-zero on any failure
from docgen.pages import PagesGenerator
pg = PagesGenerator(config)
pg.generate_index_html()
pg.generate_pages_workflow()
pg.generate_gitattributes()After the library is built:
- Add
docgen @ git+https://github.com/jmjava/documentation-generator.gittodocs/demos/requirements.txt - Create
docs/demos/docgen.yamlwith the full config shown above - Delete replaced scripts:
compose.sh,generate-all.sh,generate-narration.py,extract-timestamps.py,validate-composed-demos.sh,rebuild-after-audio.sh,regenerate-m12-2.sh - Delete the hand-maintained
docs/index.html—docgen pagesnow generates it - Replace
.github/workflows/pages.ymlwith the auto-generated version fromdocgen pages - Ensure
.gitattributeshas LFS rules (generated bydocgen pages) - Replace hand-maintained
docs/demos/.gitignorewith auto-generated version fromdocgen pages(preserves the same intermediate-object exclusions, derived fromdocgen.yamlpaths) - Remove redundant demo-related ignore rules from root
.gitignore(e.g.,docs/demos/animations/media/,docs/demos/terminal/rendered/) — now covered by the generateddocs/demos/.gitignore - Keep unchanged:
narration/,terminal/,animations/,audio/,recordings/, symlinks - Update
AGENTS.mdanddocs/demos/README.mdto referencedocgenCLI instead of shell scripts - Delete
tekton-dag/documentation-generator/(superseded by the external repo)
-
pyproject.toml— packaging with[project.scripts]entry point fordocgenCLI -
.gitignore— library repo ignores:.venv/,__pycache__/,*.egg-info/,dist/,build/, test artifacts -
src/docgen/config.py— yaml loader, path resolver, schema validation -
src/docgen/tts.py— OpenAI TTS generator with markdown stripping -
src/docgen/timestamps.py— Whisper timestamp extractor -
src/docgen/narration_lint.py— pre-TTS text lint + post-TTS transcript lint -
src/docgen/vhs.py— VHS runner + stderr error scanner -
src/docgen/manim_runner.py— Manim scene invocation -
src/docgen/compose.py— ffmpeg composition in Python -
src/docgen/ocr.py— OpenCV + pytesseract frame OCR (Manim + VHS) -
src/docgen/manim_layout.py— bounding-box overlap / spacing / edge checks -
src/docgen/av_sync.py— audio-visual synchronization validation -
src/docgen/validate.py— unified validator with--pre-push -
src/docgen/concat.py— full-demo concatenation -
src/docgen/pages.py— generate index.html, pages.yml, .gitattributes, .gitignore -
src/docgen/pipeline.py— full pipeline orchestrator -
src/docgen/wizard.py— narration setup wizard (Flask web GUI, file tree scanner, LLM narration generation) -
src/docgen/templates/— Jinja2 HTML templates for wizard UI (tree view, guidance panel, segment mapper, preview/edit) -
src/docgen/static/— CSS and JS for wizard frontend (tree checkbox controls, drag-and-drop, inline editor) -
src/docgen/cli.py— CLI dispatcher for all subcommands (includingwizard) -
tests/— unit tests for all modules (including wizard file scanning and tree building) -
examples/minimal-bundle/— example docgen.yaml and content
-
docs/demos/docgen.yaml— project config -
docs/demos/requirements.txtupdated withdocgendependency - Old scripts deleted (7 files)
- Hand-maintained
docs/index.htmldeleted (replaced by auto-generated) -
.github/workflows/pages.ymlreplaced with auto-generated version -
.gitattributesLFS rules generated -
docs/demos/.gitignorereplaced with auto-generated version (intermediate objects) - Root
.gitignorecleaned of redundant demo-specific rules -
AGENTS.mdanddocs/demos/README.mdupdated -
tekton-dag/documentation-generator/stale directory deleted
-
pip install docgen @ git+https://github.com/jmjava/documentation-generator.gitinstalls cleanly -
docgen generate-allproduces all 14 segment videos + 2 concat demos from a clean state intekton-dag/docs/demos/ -
docgen validate --pre-pushpasses on all produced assets with zero errors - OCR validation detects planted
command not foundin a test VHS recording - Layout validation detects planted overlapping text in a test Manim frame
- A/V sync validation detects a deliberately mis-timed compose (wrong audio on wrong video)
- Narration lint blocks TTS when metadata leaks into stripped text
- Narration lint flags spoken artifacts in a post-TTS transcript check
-
docgen pagesgenerates a validindex.htmlwith correct video durations from ffprobe -
docgen pagesgenerates a working.github/workflows/pages.yml -
docgen pagesgenerates a.gitignorethat excludes intermediate objects but not committed recordings - No intermediate objects (Manim partials, VHS rendered, audio MP3s) slip into git after migration
- All library unit tests pass
-
docgen wizardlaunches a local web GUI, scans the repo, and presents.mdfiles in a tree with checkboxes - Wizard Setup view generates narration drafts from selected docs + user guidance via LLM, matching existing narration style
- Wizard Production view displays per-segment review with narration editor, audio player, video player, and validation results
- User can redo individual steps (narration, TTS, VHS/Manim, compose) for a single segment from the Production view
- Segment statuses (Draft / Generated / Needs Work / Approved) persist across wizard restarts via
.docgen-state.json -
tekton-dag/docs/demos/uses onlydocgenCLI — no remaining shell scripts for generation
# ffmpeg (composition + probing)
sudo apt install ffmpeg
# VHS (terminal recording)
brew install charmbracelet/tap/vhs # or: go install github.com/charmbracelet/vhs@latest
# Tesseract OCR (frame text extraction)
sudo apt install tesseract-ocr
# Manim system deps (cairo, pango, ffmpeg)
sudo apt install libcairo2-dev libpango1.0-dev
manim
openai
opencv-python
pytesseract
pyyaml
click
flask
jinja2
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
For TTS + Whisper | OpenAI API key |
- charmbracelet/vhs — terminal recorder
- ManimCommunity/manim — animation engine
- OpenAI TTS API — gpt-4o-mini-tts
- OpenAI Whisper API — audio transcription
- pytesseract — Python OCR wrapper
- OpenCV — computer vision / frame extraction
- Milestone 8 — original demo asset milestone (partially superseded)
- docs/demos/README.md — current demo generation docs