tiktok-whisper: tiktok-whisper-video-to-text-go

Translate to: 简体中文

About tiktok-whisper-video-to-text-go

Batch convert videos to text using OpenAI's Whisper or the local coreML whisper.cpp.

The tiktok-whisper tool allows batch conversion of videos to text using either OpenAI's cloud-based Whisper API or local coreML's Whisper.cpp. It includes features such as exporting copies to Excel, saving conversion results to SQLite or PostgreSQL, video duration statistics, and keyword search to locate videos. It addresses the original whisper's limitations by offering solutions for macOS compatibility and speed enhancement.

Features

Input Xiaoyuzhou podcast links for batch audio downloading
Batch recognize audio or video, outputting text with timestamps
Save recognition results to SQLite or PostgreSQL
Use whisper_cpp + coreML for local transcription on macOS
Export historical recognition results

Quick Start

macOS

Tiktok-whisper supports multiple transcription providers that can be easily switched via command line:

Local whisper.cpp with CoreML acceleration
OpenAI Whisper API
ElevenLabs Speech-to-Text
HTTP-based whisper server
SSH remote whisper.cpp
Custom HTTP transcription services

Provider configuration is managed through providers.yaml. See Provider Switching Documentation for details.

Generate coreML's model:

mkdir -p ~/workspace/cpp/ && cd ~/workspace/cpp/
git clone git@github.com:ggerganov/whisper.cpp.git
cd whisper.cpp
bash ./models/download-ggml-model.sh large
conda create -n whisper-cpp python=3.10 -y
conda activate whisper-cpp 
pip install -U ane_transformers openai-whisper coremltools
bash ./models/generate-coreml-model.sh large
make clean
WHISPER_COREML=1 make -j

Create a providers.yaml configuration file:

default_provider: "whisper_cpp"

providers:
  whisper_cpp:
    type: "whisper_cpp"
    enabled: true
    settings:
      binary_path: "~/workspace/cpp/whisper.cpp/main"
      model_path: "~/workspace/cpp/whisper.cpp/models/ggml-large-v2.bin"
      language: "en"
      prompt: ""

Generate wire configuration and compile the executable:

cd ./internal/app
go install github.com/google/wire/cmd/wire@latest
wire

Compile tiktok-whisper with CGO_ENABLED

cd tiktok-whisper
CGO_ENABLED=1 go build -o v2t ./cmd/v2t/main.go
./v2t help

Windows

The procedure is similar to macOS.

cd tiktok-whisper
go build -o v2t.exe .\cmd\v2t\main.go
.\v2t.exe help

Usage

Download audio from Xiaoyuzhou or video from TikTok

# Download Xiaoyuzhou audio using a single episode URL
./v2t download xiaoyuzhou -e "https://www.xiaoyuzhoufm.com/episode/6398c6ae3a2b7eba5ceb462f"

# Or using multiple episode URLs
./v2t download xiaoyuzhou -e "https://www.xiaoyuzhoufm.com/episode/6398c6ae3a2b7eba5ceb462f,https://www.xiaoyuzhoufm.com/episode/6445559d420fc63f0b9e5747"

# Download all episodes from a Xiaoyuzhou podcast URL
./v2t download xiaoyuzhou -p "https://www.xiaoyuzhoufm.com/podcast/61e389402454b42a2b06177c"

After downloading, you can find the files in the data directory:

$ tree data/
data/
└── xiaoyuzhou
    └── 硬地骇客
        └── EP21 程序员的职场晋升究竟与什么有关？漂亮的代码？.mp3

Use yt-dlp to download YouTube videos

To download only audio without video, use the following command:

yt-dlp --extract-audio --audio-format mp3 "https://www.youtube.com/watch?v=tWmNN87VvcE"

Convert videos/audios to text

On macOS, you can use whisper.cpp for audio conversion, ensuring the correct setup of binaryPath and modelPath in wire.go:

# Convert an

 audio file
./v2t convert -audio --input ./test/data/test.mp3

# Convert all files in a directory with a specified file extension
./v2t convert -audio --directory ./test/data --type m4a

# Convert all mp4 files in a specified directory to text, -n specifies the maximum number of files to convert, default n=1
./v2t convert --video --directory "./test/data/mp4" --userNickname "testUser" -n 100

# Export all recognition history of a specified user as excel
./v2t export --userNickname "testUser" --outputFilePath ./data/testUser.xlsx

Provider Selection

You can switch between providers using the --provider flag:

# Use local whisper.cpp (default)
./v2t convert -a -i audio.mp3

# Use OpenAI Whisper API
./v2t convert -a -i audio.mp3 --provider openai

# Use a specific provider
./v2t convert -a -i audio.mp3 --provider whisper_server

# List all available providers
./v2t providers list

# Check provider configuration
./v2t providers config

To use OpenAI, add it to your providers.yaml:

providers:
  openai:
    type: "openai"
    enabled: true
    auth:
      api_key: "sk-your-api-key-here"
    settings:
      model: "whisper-1"
      language: "en"

Using Python scripts for faster-whisper

If you are on Windows and have a dedicated GPU, you can use Python's faster-whisper for CUDA processing. There are two Python scripts for batch audio transcription:

whisperToText.py: Transcribes a single file or all files in a single directory.
whisperToTextParallel.py: Transcribes files in multiple subdirectories in parallel.

Before running the scripts, install the required Python packages:

pip install -r requirements.txt

For single file or directory transcription, and parallel transcription of multiple subdirectories, follow the provided commands in the documentation.

Recent Features ✨

Dual Embedding System: OpenAI (1536D) + Gemini (768D) embedding support
pgvector Integration: Vector similarity search with PostgreSQL
User-Specific Processing: Generate embeddings for specific users with targeted batch processing
3D Visualization: Interactive 3D clustering visualization with Three.js
Natural Trackpad Gestures: Jon Ive-level touch interaction system
Real-time Search: Vector-based similarity search with live results
Batch Embedding Generation: CLI tools for large-scale embedding processing

Embedding & Vector Search

Generate embeddings and perform similarity search:

# Generate embeddings for all transcriptions
./v2t embed generate

# Generate embeddings for specific user
./v2t embed generate --user "username" --provider gemini

# Check embedding status and user distribution
./v2t embed status

# Search for similar content
./v2t embed search --text "your search query" --limit 10

# Calculate similarity between transcriptions
./v2t embed similarity --id1 123 --id2 456

# Find potential duplicates for specific user
./v2t embed duplicates --user "username" --threshold 0.95

# Start 3D visualization server
go run web-main.go
# Visit http://localhost:8080 for interactive clustering visualization

TODO

Video duration statistics
Use pgvector for vectorized search
3D visualization with clustering
Natural trackpad gesture support
Keyword search to locate videos
Original video jump link
Like, share, and comment statistics

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
cmd/v2t		cmd/v2t
docs		docs
internal		internal
scripts		scripts
test		test
tools		tools
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
Dockerfile.api		Dockerfile.api
Dockerfile.simple		Dockerfile.simple
Dockerfile.worker		Dockerfile.worker
GOLANG_REVERSE_ENGINEERING_GUIDE.md		GOLANG_REVERSE_ENGINEERING_GUIDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh.md		README_zh.md
TEST_RESULTS_SUMMARY.md		TEST_RESULTS_SUMMARY.md
docker-compose.shared-network.yml		docker-compose.shared-network.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
providers-example.yaml		providers-example.yaml
providers.docker-whisper.yaml		providers.docker-whisper.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tiktok-whisper: tiktok-whisper-video-to-text-go

Translate to: 简体中文

About tiktok-whisper-video-to-text-go

Features

Quick Start

macOS

Windows

Usage

Download audio from Xiaoyuzhou or video from TikTok

Use yt-dlp to download YouTube videos

Convert videos/audios to text

Provider Selection

Using Python scripts for faster-whisper

Recent Features ✨

Embedding & Vector Search

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tiktok-whisper: tiktok-whisper-video-to-text-go

Translate to: 简体中文

About tiktok-whisper-video-to-text-go

Features

Quick Start

macOS

Windows

Usage

Download audio from Xiaoyuzhou or video from TikTok

Use yt-dlp to download YouTube videos

Convert videos/audios to text

Provider Selection

Using Python scripts for faster-whisper

Recent Features ✨

Embedding & Vector Search

TODO

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages