Summary
Add a Haystack-compatible pipeline component (AudioTranscriber or DeepgramTranscriber) that integrates Deepgram STT into Haystack's pipeline architecture, enabling audio transcription as a step in Haystack RAG and document processing pipelines.
Problem it solves
Haystack is a leading open-source framework for building RAG (Retrieval-Augmented Generation) applications and document processing pipelines. Developers building audio-aware RAG systems — where meeting recordings, podcasts, or call center audio need to be transcribed and indexed — need a native Haystack component that fits into the pipeline abstraction. Without it, developers must write custom glue code to convert Deepgram transcriptions into Haystack Document objects. AssemblyAI has a Haystack integration (assemblyai-haystack); Deepgram does not, despite having superior STT accuracy.
Proposed API
from deepgram_haystack import DeepgramTranscriber
# As a Haystack pipeline component
transcriber = DeepgramTranscriber(
api_key=Secret.from_env_var("DEEPGRAM_API_KEY"),
model="nova-3",
smart_format=True,
diarize=True,
)
# In a Haystack pipeline
pipeline = Pipeline()
pipeline.add_component("transcriber", transcriber)
pipeline.add_component("splitter", DocumentSplitter())
pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())
pipeline.connect("transcriber", "splitter")
pipeline.connect("splitter", "embedder")
# Transcribe audio and process
result = pipeline.run({"transcriber": {"sources": ["meeting.mp3"]}})
Acceptance criteria
Raised by the DX intelligence system.
Summary
Add a Haystack-compatible pipeline component (AudioTranscriber or DeepgramTranscriber) that integrates Deepgram STT into Haystack's pipeline architecture, enabling audio transcription as a step in Haystack RAG and document processing pipelines.
Problem it solves
Haystack is a leading open-source framework for building RAG (Retrieval-Augmented Generation) applications and document processing pipelines. Developers building audio-aware RAG systems — where meeting recordings, podcasts, or call center audio need to be transcribed and indexed — need a native Haystack component that fits into the pipeline abstraction. Without it, developers must write custom glue code to convert Deepgram transcriptions into Haystack Document objects. AssemblyAI has a Haystack integration (
assemblyai-haystack); Deepgram does not, despite having superior STT accuracy.Proposed API
Acceptance criteria
@componentprotocol with proper input/output typesdeepgram-haystackon PyPIRaised by the DX intelligence system.