Skip to content

feat(ai-groq): transcription#649

Open
joksas wants to merge 2 commits into
TanStack:mainfrom
joksas:feat/groq-ai-transcription
Open

feat(ai-groq): transcription#649
joksas wants to merge 2 commits into
TanStack:mainfrom
joksas:feat/groq-ai-transcription

Conversation

@joksas
Copy link
Copy Markdown

@joksas joksas commented May 26, 2026

🎯 Changes

Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible, but its transcription endpoint additionally accepts HTTP URLs as input, so this is implemented as a custom integration rather than going through the SDK.

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

Release Notes

New Features

  • Added Groq as a transcription provider with support for Whisper speech-to-text models.
  • Support for multiple audio input methods: HTTP URLs and local files (various formats supported).
  • Configurable transcription options including temperature settings and timestamp granularities for detailed timing information.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

This PR adds Groq as a transcription provider to the TanStack AI library. It introduces a GroqTranscriptionAdapter that handles speech-to-text via Groq's Whisper models, supports HTTP URLs and multiple local audio formats, and provides factory functions for API key resolution. The feature is exported from the package index and integrated into E2E testing.

Changes

Groq Transcription Support

Layer / File(s) Summary
Transcription Data Contracts
packages/ai-groq/src/model-meta.ts, packages/ai-groq/src/audio/transcription-provider-options.ts, packages/ai-groq/src/adapters/transcription.ts (types/config)
Groq transcription models (whisper-large-v3-turbo, whisper-large-v3) and GroqTranscriptionProviderOptions (temperature, timestamp_granularities) are defined, along with response shape interfaces for verbose and non-verbose formats.
Transcription Adapter Implementation
packages/ai-groq/src/adapters/transcription.ts (methods, helpers)
GroqTranscriptionAdapter.transcribe() builds and sends multipart requests to Groq's /audio/transcriptions endpoint, handles response formats (verbose_json/text/JSON), maps segments and words with confidence scores, and includes error handling and request logging. Audio input normalizers convert HTTP URLs, Files, Blobs, ArrayBuffers, base64, and data URLs into the expected format.
Factory Functions and Public API
packages/ai-groq/src/adapters/transcription.ts (factories), packages/ai-groq/src/index.ts, .changeset/feat-groq-ai-transcription.md
createGroqTranscription accepts explicit API key; groqTranscription resolves GROQ_API_KEY from environment. The package index re-exports the adapter class, factories, configuration type, provider options, and transcription models alongside existing chat models. Changeset documents the custom integration due to HTTP URL input support.
E2E Testing Integration
testing/e2e/src/lib/feature-support.ts, testing/e2e/src/lib/media-providers.ts
Groq is added to the transcription provider support matrix. The test helper createTranscriptionAdapter registers a groq factory using createGroqTranscription with model whisper-large-v3-turbo, baseURL from openaiUrl(aimockPort), and standard test headers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whispers now flow through Groq's swift gate,
Audio transcribed, responses translate,
From bytes and URLs, words take their place,
Confidence scores show in verbose space,
A new provider joins the TanStack race!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(ai-groq): transcription' clearly and concisely describes the main change: adding transcription support to the Groq AI provider.
Description check ✅ Passed The PR description follows the template structure, includes clear explanation of changes, has all checklist items addressed, and specifies release impact with changeset generation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed: lockfile failed supply-chain policy check. Run pnpm install locally to update the lockfile.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-groq/src/adapters/transcription.ts`:
- Around line 161-170: The adapter currently forwards
TranscriptionOptions.responseFormat (effectiveFormat) including 'srt'/'vtt' to
form.append('response_format', effectiveFormat) and later calls response.json()
for non-verbose_json/text branches; change this to explicitly reject unsupported
formats for Groq: add a check after computing effectiveFormat (the variable
named effectiveFormat in transcription.ts) and if effectiveFormat === 'srt' ||
effectiveFormat === 'vtt' throw a clear Error like "responseFormat 'srt'/'vtt'
not supported by Groq" (or alternatively map them to a supported value such as
'text' or 'json' if you prefer), and ensure form.append('response_format', ...)
and the subsequent response handling (the branches that call response.text() or
response.json(), e.g., the block that currently uses response.json() and
generateId(this.name)) only run for supported formats
('json','text','verbose_json').
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f2e8fc62-7779-4e78-9f21-0e407ea9246f

📥 Commits

Reviewing files that changed from the base of the PR and between ef029a0 and a8ced68.

📒 Files selected for processing (7)
  • .changeset/feat-groq-ai-transcription.md
  • packages/ai-groq/src/adapters/transcription.ts
  • packages/ai-groq/src/audio/transcription-provider-options.ts
  • packages/ai-groq/src/index.ts
  • packages/ai-groq/src/model-meta.ts
  • testing/e2e/src/lib/feature-support.ts
  • testing/e2e/src/lib/media-providers.ts

Comment on lines +161 to +170
} else if (effectiveFormat === 'text') {
const text = await response.text()
return {
id: generateId(this.name),
model,
text,
...(language !== undefined && { language }),
}
} else {
const data = (await response.json()) as GroqJsonTranscriptionResponse
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify which transcription response formats are exposed in shared types
rg -nP --type=ts "interface\\s+TranscriptionOptions|responseFormat|verbose_json|srt|vtt|json|text" packages

# Compare how other transcription adapters branch on response formats
rg -nP --type=ts "responseFormat|effectiveFormat|await response\\.text\\(|await response\\.json\\(" packages/ai-*/src/adapters/*transcription*.ts

Repository: TanStack/ai

Length of output: 50368


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the exact response format union/type and any mapping for transcription
rg -n "responseFormat" packages/ai/src/types.ts packages/ai/src/**/transcription* packages/ai/src/**/transcrib* || true

# 2) Inspect Groq transcription adapter around effectiveFormat + parsing
FILE="packages/ai-groq/src/adapters/transcription.ts"
if [ -f "$FILE" ]; then
  echo "---- $FILE (relevant section) ----"
  # Print the function/section around the line numbers mentioned in the original comment
  nl -ba "$FILE" | sed -n '130,230p'
else
  echo "Missing $FILE"
fi

# 3) Find any other adapters for transcription and how they handle srt/vtt/json/text
rg -n "effectiveFormat|responseFormat" packages/ai-*/src/adapters/*transcription*.ts || true

Repository: TanStack/ai

Length of output: 435


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate Groq transcription adapter and inspect around the reported section without using nl
GROQ_FILE="packages/ai-groq/src/adapters/transcription.ts"
echo "=== Groq adapter file exists? ==="
test -f "$GROQ_FILE" && echo "yes: $GROQ_FILE" || (echo "missing: $GROQ_FILE" && exit 0)

echo "=== Groq adapter around lines 145-210 ==="
cat -n "$GROQ_FILE" | sed -n '145,210p'

echo "=== Groq adapter response_format request parameter and effectiveFormat derivation ==="
rg -n "response_format|effectiveFormat|useVerbose|verbose_json|srt|vtt" "$GROQ_FILE"

echo "=== Groq adapter parsing branches (json/text) ==="
rg -n "await response\\.json\\(|await response\\.text\\(|if \\(effectiveFormat|else if \\(effectiveFormat|else \\{" "$GROQ_FILE"

echo "=== Groq adapter non-ok (error handling) branch ==="
rg -n "if \\(!response\\.ok\\)|response\\.ok|throw" "$GROQ_FILE"

echo "=== Find other transcription adapters to compare srt/vtt handling ==="
fd -t f -e ts -e tsx "*transcription*.ts" packages \
  | sed -n '1,200p'

# For each adapter found, quickly extract how it branches on responseFormat/effectiveFormat
fd -t f -e ts -e tsx "*transcription*.ts" packages \
  | xargs -I{} sh -c 'echo "---- {} ----"; rg -n "responseFormat|effectiveFormat|verbose_json|srt|vtt" "{}" || true'

Repository: TanStack/ai

Length of output: 4412


🏁 Script executed:

#!/bin/bash
set -euo pipefail

GROQ_FILE="packages/ai-groq/src/adapters/transcription.ts"

echo "=== Groq adapter occurrences of srt/vtt ==="
rg -n "srt|vtt" "$GROQ_FILE" || true

echo "=== Groq adapter full response_format send (near form.append) ==="
rg -n "form\\.append\\('response_format'|effectiveFormat|useVerbose" "$GROQ_FILE" -n

echo "=== Other transcription adapters: files matching *transcription*.ts ==="
FILES=$(fd -t f "*transcription*.ts" packages || true)
echo "$FILES" | head -n 50

echo "=== Other transcription adapters: look for srt/vtt/text/json handling ==="
echo "$FILES" | xargs -I{} sh -c 'echo "---- {} ----"; rg -n "responseFormat|effectiveFormat|verbose_json|srt|vtt|await response\\.json\\(|await response\\.text\\(" "{}" || true'

Repository: TanStack/ai

Length of output: 1002


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the responseFormat type definition around the reported line
TFILE="packages/ai/src/types.ts"
echo "=== responseFormat type in $TFILE ==="
test -f "$TFILE" && sed -n '1680,1710p' "$TFILE" || echo "missing $TFILE"

# 2) See if Groq adapter validates/filters srt/vtt anywhere besides parsing branch
GROQ_FILE="packages/ai-groq/src/adapters/transcription.ts"
echo "=== Groq adapter: validate/map srt/vtt/response_format ==="
rg -n "responseFormat|effectiveFormat|srt|vtt" "$GROQ_FILE"

# 3) Check other transcription adapter files (fix fd usage with --glob)
echo "=== Other transcription adapter files ==="
FILES=$(fd -t f --glob '*transcription*.ts' packages)
echo "$FILES" | head -n 80

echo "=== Other transcription adapters: parsing branches ==="
echo "$FILES" | xargs -I{} sh -c '
  f="{}";
  echo "---- $f ----";
  rg -n "responseFormat|effectiveFormat|verbose_json|await response\.json\(|await response\.text\(|srt|vtt" "$f" || true
' | head -n 200

Repository: TanStack/ai

Length of output: 5060


🏁 Script executed:

#!/bin/bash
set -euo pipefail
GROQ_FILE="packages/ai-groq/src/adapters/transcription.ts"

echo "=== Groq adapter error handling branch (around !response.ok) ==="
cat -n "$GROQ_FILE" | sed -n '90,150p'

echo "=== Groq transcription provider options docs ==="
PROV_OPTS="packages/ai-groq/src/audio/transcription-provider-options.ts"
test -f "$PROV_OPTS" && cat -n "$PROV_OPTS" | sed -n '1,120p' || echo "missing $PROV_OPTS"

echo "=== Groq transcription tests mentioning responseFormat ==="
fd -t f --glob '*groq*transcription*test*.ts' packages | head -n 50
fd -t f --glob '*transcription*test*.ts' packages/ai-groq | xargs -I{} sh -c 'echo "---- {} ----"; rg -n "responseFormat|response_format|srt|vtt|verbose_json|text|json" "{}" || true'

Repository: TanStack/ai

Length of output: 3659


🌐 Web query:

Groq transcription API response_format supported values json text verbose_json srt vtt

💡 Result:

Supported Groq transcription API response_format values are: json, text, verbose_json.[1][2] Notes: - response_format defaults to json.[2] - verbose_json includes timestamps for audio segments; timestamp_granularities requires response_format = verbose_json.[1][2]

Citations:


Fix Groq responseFormat handling for srt/vtt (don’t fall back to JSON parsing / don’t forward unsupported formats).

TranscriptionOptions.responseFormat includes 'srt' | 'vtt', but the Groq transcription endpoint only supports response_format values json, text, and verbose_json. The Groq adapter forwards effectiveFormat directly to form.append('response_format', effectiveFormat) and then uses response.json() for all non-verbose_json/non-text cases—so requesting srt/vtt should be rejected (or mapped to a supported format) instead of relying on a “text-like formats” parse fallback.

Update the adapter to explicitly handle 'srt'/'vtt' (e.g., throw a clear “unsupported for Groq” error, or map to a supported format), and remove/avoid the proposed srt/vttresponse.text() parsing change since Groq won’t return those formats successfully.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-groq/src/adapters/transcription.ts` around lines 161 - 170, The
adapter currently forwards TranscriptionOptions.responseFormat (effectiveFormat)
including 'srt'/'vtt' to form.append('response_format', effectiveFormat) and
later calls response.json() for non-verbose_json/text branches; change this to
explicitly reject unsupported formats for Groq: add a check after computing
effectiveFormat (the variable named effectiveFormat in transcription.ts) and if
effectiveFormat === 'srt' || effectiveFormat === 'vtt' throw a clear Error like
"responseFormat 'srt'/'vtt' not supported by Groq" (or alternatively map them to
a supported value such as 'text' or 'json' if you prefer), and ensure
form.append('response_format', ...) and the subsequent response handling (the
branches that call response.text() or response.json(), e.g., the block that
currently uses response.json() and generateId(this.name)) only run for supported
formats ('json','text','verbose_json').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant