Releases: ActiveInferenceInstitute/Journal-Utilities
Releases · ActiveInferenceInstitute/Journal-Utilities
v0.1.0: Hardened, Documented, and Feature-Complete
Release v0.1.0: Hardened, Documented, and Feature-Complete 🚀
We are thrilled to announce Journal-Utilities v0.1.0, the first major release of the Active Inference Institute's video processing pipeline. This release brings a fully modular, config-driven system for downloading, transcribing, enriching, and exploring the Institute's massive video library.
✨ Key Features
📥 Universal Download Pipeline
- Cookie Authentication: Bypasses YouTube rate limits and 403 errors using browser cookies.
- Resumable: Smartly skips existing files to save bandwidth.
- Multi-Format: Downloads Video (MP4), Audio (MP3/M4A), and Subtitles (VTT/TXT).
🎙️ Dual-Engine Transcription
- Apple Silicon: Native
mlx-whispersupport for blazing fast on-device transcription. - GPU Cluster:
WhisperXintegration with speaker diarization and word-level alignment.
🧠 RAG & Knowledge Graph
- Entity Extraction: Uses Cohere AI to extract people, concepts, and relationships.
- Graph Database: Stores connected knowledge in SurrealDB for complex querying.
- Chat Engine: Ollama-powered RAG (default
gemma3:4b) for chatting with the video library.
📄 Rich Export
- 5 Formats: Markdown, JSON, HTML, PDF, Plaintext.
- Enriched Metadata: All exports now include Title, Category, Series, Episode, Speakers, Duration, URL, and View Count in headers/frontmatter.
🌐 Web Interface
- Video Library: Searchable, filterable, paginated view of all content.
- Video Detail: Embedded player, full transcript viewer, and metadata side-panel.
- Interactive Chat: Built-in RAG chat interface with streaming responses.
🏗️ Architecture
The system is built on a config-driven philosophy. A single config.ini controls the entire pipeline:
[general]
data_dir = data/output
[download]
transcripts = true
audio = true
cookies_from_browser = chrome
[transcribe]
engine = mlx-whisper
model = mlx-community/whisper-large-v3-turbo
[export]
markdown = true
pdf = true
json = true🚀 Quick Start
# Clone and install
git clone https://github.com/ActiveInferenceInstitute/Journal-Utilities.git
cd Journal-Utilities
uv venv && source .venv/bin/activate
uv pip install -e ".[dev,interface,export]"
# Run the full pipeline
python run.py📚 Documentation
- README - New visual entry point
- Configuration Guide
- Web Interface
- Export Guide
- Transcription Guide
Contributors: @ActiveInferenceInstitute, @hollygrimm, @DaveDouglass
License: MIT
Release 1
First release of Journal Utilities