🎙️ Talk2Scene

Audio-driven intelligent animation generation — from dialogue to visual storytelling.

Talk2Scene is an audio-driven intelligent animation tool that automatically parses voice dialogue files, recognizes text content and timestamps, and uses AI to recommend matching character stances (STA), expressions (EXP), actions (ACT), backgrounds (BG), and CG illustrations inserted at the right moments. It produces structured scene event data and composes preview videos showing AI characters performing dynamically across scenes.

Designed for content creators, educators, virtual streamers, and AI enthusiasts — Talk2Scene turns audio into engaging visual narratives for interview videos, AI interactive demos, educational presentations, and more.

💡 Why Talk2Scene

Manually composing visual scenes for dialogue-driven content is tedious and error-prone. Talk2Scene automates the entire workflow: feed in audio or a transcript, and the pipeline produces time-synced scene events — ready for browser playback or video export — without touching a single frame by hand.

🏗️ Architecture

flowchart LR
    A[Audio] --> B[Transcription\nWhisper / OpenAI API]
    T[Text JSONL] --> C
    B --> C[Scene Generation\nLLM]
    C --> D[JSONL Events]
    D --> E[Browser Viewer]
    D --> F[Static PNG Render]
    D --> G[Video Export\nffmpeg]

Scenes are composed from five layer types stacked bottom-up:

flowchart LR
    BG --> STA --> ACT --> EXP

A CG illustration, when active, replaces the entire layered scene.

🖼️ Example Output

Example Video

Rendered Scenes

Left: Basic scene (Lab + Stand Front + Neutral) · Center: Cafe scene (Cafe + Stand Front + Thinking) · Right: CG mode (Pandora's Tech)

Asset Layers

Each scene is composed by stacking transparent asset layers on a background. Below is one sample from each category:

Layer	Code	Description
🌅 BG	`BG_Lab_Modern`	Background (opaque)
🧍 STA	`STA_Stand_Front`	Stance / pose (transparent)
🎭 EXP	`EXP_Smile_EyesClosed`	Expression overlay (transparent)
🤚 ACT	`ACT_WaveGreeting`	Action overlay (transparent)
✨ CG	`CG_PandorasTech`	Full-scene illustration (replaces all layers)

📦 Install

Important

Requires Python 3.11+, uv, and FFmpeg.

uv sync

Set your OpenAI API key:

export OPENAI_API_KEY="your-key"

🚀 Usage

uv run talk2scene --help

📝 Text Mode

Generate scenes from a pre-transcribed JSONL file:

uv run talk2scene mode=text io.input.text_file=path/to/transcript.jsonl

🎧 Batch Mode

Process an audio file end-to-end (place audio in input/):

uv run talk2scene mode=batch

🎬 Video Mode

Render a completed session into video:

uv run talk2scene mode=video session_id=SESSION_ID

📡 Stream Mode

Consume audio or pre-transcribed text from Redis in real time:

uv run talk2scene mode=stream

📚 Documentation

Full documentation (English & 中文) is available at discover304.top/talk2scene.

📬 Contact

✉️ Email: hobart.yang@qq.com
🐛 Issues: Open an issue on GitHub

📄 License

Licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
assets		assets
conf		conf
config		config
docs		docs
evaluation		evaluation
input		input
talk2scene		talk2scene
tests		tests
web		web
.gitignore		.gitignore
.gitkeep		.gitkeep
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
conftest.py		conftest.py
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
talk2scene.py		talk2scene.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Talk2Scene

💡 Why Talk2Scene

🏗️ Architecture

🖼️ Example Output

Example Video

Rendered Scenes

Asset Layers

📦 Install

🚀 Usage

📝 Text Mode

🎧 Batch Mode

🎬 Video Mode

📡 Stream Mode

📚 Documentation

📬 Contact

📄 License

About

Uh oh!

Uh oh!

Languages

License

yhbcode000/talk2scene

Folders and files

Latest commit

History

Repository files navigation

🎙️ Talk2Scene

💡 Why Talk2Scene

🏗️ Architecture

🖼️ Example Output

Example Video

Rendered Scenes

Asset Layers

📦 Install

🚀 Usage

📝 Text Mode

🎧 Batch Mode

🎬 Video Mode

📡 Stream Mode

📚 Documentation

📬 Contact

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages