AI Podcast Generator

An application that leverages AI to generate engaging podcasts from user-provided topics and contexts. The system handles the entire process from content planning to audio generation, creating natural-sounding conversations between AI-generated participants.

Goal of this app

My main goal was to create a non-trivial application that uses enterprise-grade coding patterns using a 100% Ai-driven coding approach. The idea was to create an application that really does something useful, with coding standards and best practices that would be acceptable by a tech enterprise. It also includes some typical enterprise-level requirements like using an external IDP (Identity Provider) and MFA.

I wanted to prove that Ai-driven coding can be used in an enterprise context, going far beyond the nice but way-too-simple demos that are circulating on youtube & co.

Ai-driven coding setup

The way I think about Ai-driven coding is that I work with a virtual team that consists of various Ai models and tools. I used the following setup for Ai-driven coding:

Perplexity - my trusted academic researcher
o1 and Claude Sonnet - my solution architects
Claude Sonnet 3.5 2024-10-22 via API - my senior coder
Claude Haiku 3.5 2024-10-22 via API - my tireless assistant that creates the diffs & merges all changes
aider.chat - my coding buddy who ties everything together. Aider is an amazing coding agent that fits right into your SDLC. It ties together your prompts, model responses, codebase and IDE in a natural Ai-driven coding workflow. Highly recommended. Aider needs some instructions to prompt the models in the right way of course:
- Coding best practices - see CONVENTIONS.md. Generated with o1 and manual edits.
- Techstack definition - see techstack.md. Manual work.
- Architecture guidelines - see specification-architecture.md. Mix of gpt4o generated content and manual content.
- Specification of user flow - see specification-manual. Manual work.

I used aider in architect mode, which lets you validate the code produced by the Ai before adding it to the codebase. Recommended.

Ai-driven coding experience

It's been an amazing experience. Ai wrote >99% of the code in this repo!

My tasks mainly consisted of guiding the Ai:

providing the spec
orchestrating the implementation in phases (data model first, then services, then REST controllers etc)
prompting in the right way
checking the proposed solution and providing guidance
testing the application
pointing out bugs
writing some code in very rare occasions

It's clear: the Ai models still make a lot of (sometime foolish) mistakes. But they can correct the bugs themselves with some guidance, often simply feeding back output from tests or build errors. And in general they crank out high-quality code at a breath-taking speed. I was able to create this complete application in a couple of weekends of prompting, a fraction of the time it would have taken to code things manually.

Features

🎙️ Generate multi-speaker podcasts from text descriptions
🎯 Context-aware discussion generation based on external sources like websites or uploaded documents
🗣️ Custom voice generation for participants
📝 AI-driven content generation and conversation flow
🔊 High-quality audio synthesis
📊 Real-time generation progress tracking

Installation

Prerequisites

Java 17 or higher
Node.js 18 or higher
PostgreSQL 15 or higher
Maven 3.8+

Database Setup

# Login as postgres user
sudo -u postgres psql

# Create database and user
CREATE DATABASE podcast_db;
CREATE USER podcastadmin WITH ENCRYPTED PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE podcast_db TO podcastadmin;
# needed to create schemas starting from postgresql v15
GRANT all ON SCHEMA public TO podcastadmin;

Zitadel Setup

This application uses Zitadel for OAuth2 authentication. Follow these steps to set up your Zitadel instance:

Create a Zitadel account at console.zitadel.ch or set up your own instance
Create a new Project in Zitadel
- Go to Projects → New
- Give your project a name (e.g., "Podcast Generator")
Create an OAuth2 Application
- In your project, go to Applications → New
- Choose "Web Application"
- Set the following:
  - Name: Podcast Generator
  - RedirectURLs:
    - http://localhost:8080/login/oauth2/code/zitadel (development)
    - https://your-domain/login/oauth2/code/zitadel (production)
  - Post Logout URLs:
    - http://localhost:8080 (development)
    - https://your-domain (production)
  - Enable PKCE (Proof Key for Code Exchange)
Note down the following values for your .env file:
- ZITADEL_DOMAIN (e.g., my-instance.zitadel.cloud)
- ZITADEL_CLIENT_ID (from your application settings)
- ZITADEL_ORG_ID (your organization ID)

Environment Variables

Create a .env file in the root directory with:

PODCASTGEN_DB_HOST=localhost
PODCASTGEN_DB_PORT=5432
PODCASTGEN_DB_NAME=podcast_db
PODCASTGEN_DB_USERNAME=podcastadmin
PODCASTGEN_DB_PASSWORD=your_password

OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key

ZITADEL_DOMAIN=your_zitadel_domain
ZITADEL_CLIENT_ID=your_client_id
ZITADEL_ORG_ID=your_org_id

Building and Running

Clone this repo

git clone https://github.com/rhofkens/podcast-generator

Build and run the application.

mvn spring-boot:run

Open the browser and navigate to http://localhost:8080. Log on with your Zitadel account and click "New podcast" to create your first podcast.

User Flow

Create Podcast
- Enter basic podcast metadata (title, description, length)
- Provide optional context URLs or descriptions
Define Participants
- Add and configure podcast participants
- Customize voice characteristics
- Generate or select synthetic voices
Review Transcript
- Review AI-generated conversation
- Adjust content and flow
- Fine-tune participant interactions
Generate Podcast
- Monitor real-time generation progress
- Preview generated audio
- Download final podcast

Architecture

Frontend

React with TypeScript
TailwindCSS for styling
WebSocket integration for real-time updates
Component-based architecture with wizard pattern

Backend

Spring Boot application
OAuth2 authentication with Zitadel
WebSocket support for generation progress
JPA/Hibernate for data persistence

AI Integration

OpenAI GPT-4 for content generation
ElevenLabs for voice synthesis
Custom prompt engineering for natural conversations

Data Flow

User input → React frontend
REST API endpoints → Spring Backend
Content generation → OpenAI
Voice synthesis → ElevenLabs
Real-time updates → WebSocket
Audio delivery → Frontend player

Potential Improvements

Known bugs

There are several issues with the podcast editing feature, depending on the state of the podcast. Background processing doesn't work as expected.

Technical Improvements

Implement caching for generated audio segments
Add background audio mixing capabilities
Implement batch processing for large podcasts
Add audio post-processing options
Implement voice cloning capabilities
Smarter, iterative prompting for transcript creation

Feature Improvements

Add collaborative editing features
Implement podcast templates
Add support for music integration
Add export options for different platforms

User Experience

Improve podcast editing & draft saving
Improve error handling and recovery

Code quality

Add unit tests for services layer
Improve test coverage for REST API layer
Add integration tests
Add load tests
Add security testing in build pipeline - SAST, SCA, DAST
Add load testing and soak testing
and more...

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 639 Commits
.idea		.idea
documentation		documentation
scripts		scripts
src		src
.gitignore		.gitignore
CONVENTIONS.md		CONVENTIONS.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Podcast Generator

Goal of this app

Ai-driven coding setup

Ai-driven coding experience

Features

Installation

Prerequisites

Database Setup

Zitadel Setup

Environment Variables

Building and Running

User Flow

Architecture

Frontend

Backend

AI Integration

Data Flow

Potential Improvements

Known bugs

Technical Improvements

Feature Improvements

User Experience

Code quality

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Podcast Generator

Goal of this app

Ai-driven coding setup

Ai-driven coding experience

Features

Installation

Prerequisites

Database Setup

Zitadel Setup

Environment Variables

Building and Running

User Flow

Architecture

Frontend

Backend

AI Integration

Data Flow

Potential Improvements

Known bugs

Technical Improvements

Feature Improvements

User Experience

Code quality

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages