Skip to content

vector17002/video-transcoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Thumbnail Processing POC

Overview

This project is a Proof of Concept (POC) for a robust video processing pipeline. It allows users to authenticate, upload videos directly to AWS S3 using pre-signed URLs, and triggers asynchronous backend workers to transcode the video, generate HLS (HTTP Live Streaming) streams, extract thumbnails, transcribe audio, and generate AI summaries.

This README is designed to provide context for AI assistants or human developers to quickly understand the architecture, maintain, and expand the project.

Technology Stack

  • Frontend: Next.js 16 (App Router), React 19
  • Styling: Tailwind CSS 4
  • Backend API: Node.js, Express.js (v5)
  • Database: PostgreSQL (provided locally via Neon Docker image)
  • ORM: Drizzle ORM
  • Task Queue: Redis + BullMQ
  • Video Processing: FFmpeg (via fluent-ffmpeg wrapper)
  • Speech-to-Text: faster-whisper (CTranslate2 Whisper running on CPU)
  • AI Summarization: Gemini 2.5 Flash via deepagents SDK
  • Storage: AWS S3
  • Containerization: Docker & Docker Compose

Architecture & Data Flow

  1. Authentication:
    • Users register/login via /api/auth endpoints.
    • The backend issues a JWT, stored client-side (cookies/localStorage) for subsequent requests.
  2. S3 Direct Upload:
    • Frontend calls /api/s3/upload to request a secure pre-signed URL.
    • A direct PUT request is made from the browser to S3, eliminating the need to proxy massive video files through the Node.js server.
  3. Trigger Processing:
    • Once the S3 upload finishes, the frontend signals the backend via /api/s3/process (passing the fileId).
  4. Background Workers (BullMQ):
    • The /api/s3/process endpoint enqueues jobs onto Redis.
  • Separate worker processes pick up these jobs to execute heavy commands asynchronously.
  • Transcode Worker: Processes/compresses the original video.
  • HLS Worker: Converts the video into partitioned HLS streams for adaptive bitrate streaming.
  • Thumbnail Worker: Extracts static thumbnails.
  • Transcribe Worker: (AI Worker) Downloads video, extracts audio, performs speech-to-text via faster-whisper, uploads the transcript file to S3, and enqueues the transcript's plain text onto the summary queue.
  • Summary Worker: (AI Worker) Takes the plain text transcript, invokes the Gemini API via deepagents (gemini-2.5-flash), saves the generated summary directly in the database (summary column), and updates the summary status.

Directory Structure

videoThumbnailProcessingPOC/
├── web/                  # Next.js 16 frontend app
│   ├── src/
│   │   ├── app/          # Next.js App Router components and layouts
│   │   │   └── components/ # Reusable UI components
│   │   └── lib/          # Central API client and utility library
│   ├── package.json      # Frontend npm package manifest
│   └── tsconfig.json     # Frontend TypeScript settings
├── server/               # Express 5 backend server & workers
│   ├── src/
│   │   ├── app.ts        # Express app configuration & middleware
│   │   ├── server.ts     # REST API server entry point
│   │   ├── worker.ts     # BullMQ transcoder, HLS, and thumbnail worker entry point
│   │   ├── ai-worker.ts  # BullMQ faster-whisper transcription & Gemini summary worker entry point
│   │   ├── routes/       # Express route controllers
│   │   ├── controllers/  # API action handlers
│   │   ├── middleware/   # Authentication and validation middlewares
│   │   ├── models/       # Drizzle schema definitions
│   │   ├── services/     # S3, Transcode, transcription, and summarization services
│   │   └── workers/      # BullMQ queue & worker specifications
│   ├── drizzle/          # Migration SQL files generated by Drizzle
│   ├── Dockerfile        # Multi-stage Docker container build instructions
│   ├── package.json      # Server dependencies & launch scripts
│   └── .env.example      # Template for server environment configurations
└── docker-compose.yml    # Orchestrates Redis, local DB connection proxy, API, Worker, and AI Worker

Local Development Setup

  1. Environment Variables: Navigate to the server directory, copy the example environment file, and fill in your AWS credentials and PostgreSQL URI:

    cd server
    cp .env.example .env
    # Edit .env with your AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, S3_BUCKET_NAME etc.
  2. Docker Compose: From the root of the project, spin up the entire infrastructure:

    docker-compose up --build

    This command starts:

    • redis: Used by BullMQ.
    • neon-local: Local Neon PostgreSQL instance.
    • api: Node server running on port 8080.
    • worker: Node process dedicated strictly to processing the video/HLS/thumbnail transcoding queues.
    • ai-worker: Node process dedicated to running the audio transcription and transcript summarization workers (requires a GEMINI_API_KEY).
  3. Database Push (if running locally natively):

    cd server
    npm install
    npm run db:push
  4. Accessing the UI: Run the Next.js frontend development server:

    cd web
    npm install
    npm run dev

    Now open http://localhost:3000 in your web browser to view the application interface.

AI Assistant Guide for Maintenance & Expansion

If you are an AI attempting to update or debug this repository, keep the following principles in mind:

  • Adding New Features to Upload Flow:

    • The upload bypasses the server body parsers for efficiency. Any metadata should be saved in the database before generating the pre-signed URL or during the /api/s3/process callback.
  • Expanding the Worker System:

    • To add a new background task (e.g., video transcription or watermark injection), create a new queue and worker pair in server/src/workers/.
    • Be sure to instantiate and attach the new worker in server/src/worker.ts so the worker Docker container actually listens to the queue.
    • Workers use fluent-ffmpeg. Ensure the worker environment always has FFmpeg installed (handled automatically in the current Dockerfile).
  • Database Tweaks:

    • Modify server/src/models/*.ts.
    • Use npm run db:generate followed by npm run db:migrate or npm run db:push to apply changes.
  • Frontend Development:

    • The frontend is built on Next.js 16 (App Router) and React 19. All user components reside in web/src/app/components/ and route pages are in web/src/app/. Keep style definitions standardized under Tailwind CSS 4 utility classes.
  • Monitoring & Scale constraints:

    • docker-compose.yml limits the worker container to 2GB of memory. High-resolution transcoding may require increasing this allocation.

Releases

No releases published

Packages

 
 
 

Contributors