USE_AGENTIC_AI=1 python -m lyrics_transcriber.cli.cli_main your-song.mp3 \
--artist "Artist Name" --title "Song Title"python scripts/analyze_annotations.py
# Output: CORRECTION_ANALYSIS.mdpython scripts/generate_few_shot_examples.py
# Output: lyrics_transcriber/correction/agentic/prompts/examples.yaml~/lyrics-transcriber-cache/correction_annotations.jsonl- All human annotations (backup this!)~/lyrics-transcriber-cache/llm_response_cache.json- Cached LLM responses (speeds up re-runs)lyrics_transcriber/correction/agentic/prompts/examples.yaml- Few-shot examples
Note: Default cache directory is ~/lyrics-transcriber-cache/. Customize with LYRICS_TRANSCRIBER_CACHE_DIR env var.
scripts/analyze_annotations.py- Generate reportsscripts/generate_few_shot_examples.py- Update classifier
IMPLEMENTATION_COMPLETE.md- Complete overviewHUMAN_FEEDBACK_LOOP.md- Detailed feedback loop guideQUICK_START_AGENTIC.md- Testing and troubleshootingAGENTIC_IMPLEMENTATION_STATUS.md- Technical details
# Enable
export USE_AGENTIC_AI=1
# Disable
unset USE_AGENTIC_AIDefault: Enabled (saves time and compute!)
Caches LLM responses to avoid redundant calls when re-running the same song:
# Cache is enabled by default
# Responses stored in: ~/lyrics-transcriber-cache/llm_response_cache.json
# Disable caching (force fresh LLM calls)
export DISABLE_LLM_CACHE=1
# Clear the cache (easiest way)
python scripts/manage_llm_cache.py clearBenefits:
- Re-run same song instantly (no 30sec per gap wait)
- Iterate on frontend/UI changes without LLM calls
- Save GPU power and API costs
- Cache persists across runs
When cache is used:
- Same song, same prompts → instant (cached)
- Same song, updated prompts → fresh LLM calls
- Different song → fresh LLM calls
- Different model → fresh LLM calls
In UI: Stored in browser localStorage (annotationsEnabled)
- Default: Enabled
- Toggle will be added to UI settings in future update
export LANGFUSE_PUBLIC_KEY="your_key"
export LANGFUSE_SECRET_KEY="your_secret"
export LANGFUSE_HOST="https://cloud.langfuse.com"- SOUND_ALIKE - Homophones (auto-corrects)
- BACKGROUND_VOCALS - Parentheses content (auto-deletes)
- EXTRA_WORDS - "And", "But" fillers (auto-deletes)
- PUNCTUATION_ONLY - Style differences (no action)
- NO_ERROR - Matches reference (no action)
- REPEATED_SECTION - Chorus repeats (flags for review)
- COMPLEX_MULTI_ERROR - Multiple issues (flags for review)
- AMBIGUOUS - Unclear (flags for review)
- Sound-alike errors with clear reference match
- Background vocals in parentheses
- Extra filler words at sentence start
- Repeated sections (chorus/verse)
- Complex gaps with multiple errors
- Ambiguous cases needing audio verification
- Any case where handler is uncertain
- Punctuation/style differences only
- Transcription matches at least one reference source
- All gaps are classified
- Handler decisions are logged
- Session grouped:
lyrics-correction-{uuid}
# Successful classification
🤖 Classified gap gap_1 as GapCategory.SOUND_ALIKE (confidence: 0.95)
# Correction applied
Made correction: 'out' -> 'now' (confidence: 0.75, reason: ...)
# Flagged for review
🤖 Agent returned 1 proposals [action: Flag, requires_human_review: True]# View raw annotations
cat cache/correction_annotations.jsonl | jq
# Count annotations
wc -l cache/correction_annotations.jsonl
# Get statistics
python -c "from lyrics_transcriber.correction.feedback.store import FeedbackStore; print(FeedbackStore('cache').get_statistics())"Cause: LLM response parsing issue or invalid JSON
Fix:
- Check LLM is running (Ollama, OpenAI, etc.)
- Check API keys if using cloud provider
- Response parser will attempt JSON fixes automatically
- Falls back to FLAG proposal gracefully
Not an error! This means:
- Gap was classified as NO_ERROR or PUNCTUATION_ONLY
- Handler returned NoAction proposal
- Or gap was flagged for human review
Check:
- Are you in read-only mode? (Need live API connection)
- Did text actually change? (Modal only shows if original ≠ corrected)
- Check browser console for errors
- Try toggling annotations on/off in localStorage
Normal: Each gap requires 1-2 LLM calls
- Classification: ~5-30 seconds (depends on model)
- Handler logic: Usually instant (deterministic)
Speed up:
- Use faster models (GPT-4-turbo instead of local Ollama)
- Process fewer songs at once
- Consider batching gaps (future enhancement)
cache/correction_annotations.jsonl- YOUR MOST VALUABLE DATAlyrics_transcriber/correction/agentic/prompts/examples.yaml- Your trained promptscache/*.json- Cached anchor sequences and reference lyrics
# Daily backup script
cp cache/correction_annotations.jsonl backups/annotations_$(date +%Y%m%d).jsonl
# Weekly backup
tar -czf backups/cache_$(date +%Y%m%d).tar.gz cache/
# Version control
git add lyrics_transcriber/correction/agentic/prompts/examples.yaml
git commit -m "Update few-shot examples from human annotations"-
Start with:
lyrics_transcriber/correction/agentic/agent.py- See
propose_for_gap()method for main workflow
- See
-
Then read:
lyrics_transcriber/correction/agentic/handlers/base.py- Understand handler interface
-
Pick a handler:
handlers/sound_alike.py- See how each category is processed
-
Review prompt:
prompts/classifier.py- See how LLM is guided to classify
User processes song
↓
LyricsCorrector detects gaps
↓
AgenticCorrector.propose_for_gap()
↓
classify_gap() → LLM classifies
↓
HandlerRegistry.get_handler(category)
↓
Handler.handle() → generates proposals
↓
Proposals → Adapter → WordCorrections
↓
Applied to segments
↓
UI shows for review
↓
Human makes corrections
↓
Annotation modal appears
↓
Annotation saved to JSONL
↓
Periodic analysis → Update few-shot examples
↓
Classifier improves → Better classifications
Refer to the comprehensive guides:
- Usage:
HUMAN_FEEDBACK_LOOP.md - Technical:
AGENTIC_IMPLEMENTATION_STATUS.md - Testing:
QUICK_START_AGENTIC.md - Overview:
IMPLEMENTATION_COMPLETE.md
Last Updated: 2025-10-27 Version: 1.0.0 Status: Production Ready ✅