This project is a Node.js pipeline that reads a Markdown file containing a list of YouTube videos and generates a single Markdown file with full subtitle transcriptions, in the original language of each video, whenever available.
It is designed to be:
- resilient to failures
- restartable
- parallelized
- compatible with modern YouTube subtitle quirks
- Reads a
.mdfile with YouTube links - Downloads subtitles using
yt-dlp - Automatically selects the best available subtitle language
- Supports manual subtitles, auto-generated subtitles, and
*-origtracks - Cleans VTT files (removes timestamps, tags, and formatting)
- Writes a clean, readable transcription per video
- Parallel processing (configurable)
- Retry mechanism with failure tracking
- Persistent progress (can resume after crashes or network loss)
- Detailed logging to file
- Modular architecture for easy maintenance and testing
.
βββ src/
β βββ config/
β β βββ constants.js # All configuration constants
β βββ utils/
β β βββ file.js # File operations
β β βββ logger.js # Logging system
β β βββ vtt.js # VTT cleaning utilities
β βββ services/
β β βββ markdown.js # Input MD parsing
β β βββ ytdlp.js # yt-dlp integration
β β βββ progress.js # Progress tracking
β βββ workers/
β β βββ videoProcessor.js # Video processing logic
β βββ index.js # Main entry point
βββ video-list.md # Input file (list of videos)
βββ video-list-transcription.md # Output file (generated)
βββ progress.json # Progress tracking
βββ tmp_subs/ # Temporary subtitle downloads
βββ logs/
β βββ app.log # Detailed execution logs
βββ package.json
βββ README.md
The project follows a modular architecture with clear separation of concerns:
Centralized configuration and constants
- Paths, timeouts, parallel limits, log levels
Reusable utility functions
- logger.js: Multi-level logging system
- file.js: File and directory operations
- vtt.js: VTT subtitle cleaning
Business logic and external integrations
- markdown.js: Parse input Markdown file
- ytdlp.js: All yt-dlp interactions (download, list subs, etc.)
- progress.js: Save/load progress tracking
Processing and orchestration
- videoProcessor.js: Individual video processing and parallel execution
Main entry point that orchestrates the entire pipeline
The input file must be a Markdown file with links in this format:
[Video Title](https://www.youtube.com/watch?v=VIDEO_ID)Example:
[How to Build a CLI Tool](https://www.youtube.com/watch?v=abc123)
[Node.js Best Practices](https://www.youtube.com/watch?v=xyz789)## Video Title
https://www.youtube.com/watch?v=VIDEO_ID
Full transcription text goes here...
## Another Video Title
https://www.youtube.com/watch?v=ANOTHER_ID
Another full transcription...*-origsubtitles (original language track)- Single available manual subtitle
- Auto-generated subtitle (fallback)
- Fail only if no subtitles exist
This ensures you always get the highest quality subtitle available.
# Using npm scripts
npm start
# Or directly with Node.js
node src/index.js
# Development mode with auto-reload
npm run devYou can stop and restart at any time. Progress is saved automatically in progress.json.
Edit src/config/constants.js to customize:
export const SETTINGS = {
MAX_PARALLEL: 6, // Number of concurrent downloads
MAX_RETRIES: 3, // Retry attempts per video
TIMEOUT_MS: 30_000, // Timeout for each operation
};- Node.js 18+ (ES modules support)
- yt-dlp executable (must be in project root or PATH)
- Recommended: ffmpeg, Node.js runtime for yt-dlp
# Windows
# Download from https://github.com/yt-dlp/yt-dlp/releases
# macOS/Linux
pip install yt-dlp
# or
brew install yt-dlp- New utilities β
src/utils/ - New services β
src/services/ - New processing logic β
src/workers/ - Configuration changes β
src/config/constants.js
The modular structure makes it easy to test individual components:
// Example: Testing VTT cleaning
import { cleanVtt } from './src/utils/vtt.js';
const dirty = 'WEBVTT\n\n00:00:01.000 --> 00:00:05.000\nHello world';
const clean = cleanVtt(dirty);
console.log(clean); // "Hello world"Logs are written to logs/app.log with timestamps and severity levels:
- DEBUG: Detailed execution info
- INFO: General progress updates
- WARN: Non-critical issues
- ERROR: Failures with retry info
- FATAL: Critical errors that stop execution
The progress.json file tracks:
- β
"done": Successfully processed - β
"failed": Failed after all retries
Delete this file to reprocess all videos.
- Check if the video has subtitles enabled
- Try running
yt-dlp --list-subs VIDEO_URLmanually
- Increase
TIMEOUT_MSinsrc/config/constants.js - Check your internet connection
- Ensure
yt-dlp.exeis in the project root - Or update
PATHS.YT_DLPinsrc/config/constants.js
This project is under MIT license. Check the file LICENSE for details.
Made with π by Renan Oliveira