Skip to content

Releases: ActiveInferenceInstitute/Journal-Utilities

v0.1.0: Hardened, Documented, and Feature-Complete

18 Feb 22:51

Choose a tag to compare

Release v0.1.0: Hardened, Documented, and Feature-Complete 🚀

We are thrilled to announce Journal-Utilities v0.1.0, the first major release of the Active Inference Institute's video processing pipeline. This release brings a fully modular, config-driven system for downloading, transcribing, enriching, and exploring the Institute's massive video library.

✨ Key Features

📥 Universal Download Pipeline

  • Cookie Authentication: Bypasses YouTube rate limits and 403 errors using browser cookies.
  • Resumable: Smartly skips existing files to save bandwidth.
  • Multi-Format: Downloads Video (MP4), Audio (MP3/M4A), and Subtitles (VTT/TXT).

🎙️ Dual-Engine Transcription

  • Apple Silicon: Native mlx-whisper support for blazing fast on-device transcription.
  • GPU Cluster: WhisperX integration with speaker diarization and word-level alignment.

🧠 RAG & Knowledge Graph

  • Entity Extraction: Uses Cohere AI to extract people, concepts, and relationships.
  • Graph Database: Stores connected knowledge in SurrealDB for complex querying.
  • Chat Engine: Ollama-powered RAG (default gemma3:4b) for chatting with the video library.

📄 Rich Export

  • 5 Formats: Markdown, JSON, HTML, PDF, Plaintext.
  • Enriched Metadata: All exports now include Title, Category, Series, Episode, Speakers, Duration, URL, and View Count in headers/frontmatter.

🌐 Web Interface

  • Video Library: Searchable, filterable, paginated view of all content.
  • Video Detail: Embedded player, full transcript viewer, and metadata side-panel.
  • Interactive Chat: Built-in RAG chat interface with streaming responses.

🏗️ Architecture

The system is built on a config-driven philosophy. A single config.ini controls the entire pipeline:

[general]
data_dir = data/output

[download]
transcripts = true
audio = true
cookies_from_browser = chrome

[transcribe]
engine = mlx-whisper
model = mlx-community/whisper-large-v3-turbo

[export]
markdown = true
pdf = true
json = true

🚀 Quick Start

# Clone and install
git clone https://github.com/ActiveInferenceInstitute/Journal-Utilities.git
cd Journal-Utilities
uv venv && source .venv/bin/activate
uv pip install -e ".[dev,interface,export]"

# Run the full pipeline
python run.py

📚 Documentation


Contributors: @ActiveInferenceInstitute, @hollygrimm, @DaveDouglass
License: MIT

Release 1

20 Mar 13:18

Choose a tag to compare

First release of Journal Utilities