Skip to content

feat: add OpenAI diarization support#651

Open
8times4 wants to merge 2 commits into
TanStack:mainfrom
8times4:feat/openai-transcription-diarization
Open

feat: add OpenAI diarization support#651
8times4 wants to merge 2 commits into
TanStack:mainfrom
8times4:feat/openai-transcription-diarization

Conversation

@8times4
Copy link
Copy Markdown

@8times4 8times4 commented May 27, 2026

🎯 Changes

This change adds diarization support for OpenAI's gpt-4o-transcribe-diarize model, based on https://developers.openai.com/api/docs/guides/speech-to-text?lang=javascript

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features

    • Added OpenAI speaker diarization support (GPT-4o diarize) for multi-speaker audio
    • Added diarized_json response format with speaker-labeled segments
    • Added chunking strategy and diarization-related configuration options
  • Documentation

    • Updated transcription docs and adapter guides with diarization examples, model notes, and best practices
  • Tests

    • Added tests covering diarization request handling, parsing, and validation

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31ddcc92-9d71-4e4e-bb45-abed172e1ccf

📥 Commits

Reviewing files that changed from the base of the PR and between a59d368 and 05dfb53.

📒 Files selected for processing (2)
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai-openai/tests/transcription-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai-openai/tests/transcription-adapter.test.ts

📝 Walkthrough

Walkthrough

This PR adds speaker diarization support to OpenAI transcription via the new gpt-4o-transcribe-diarize model. The changes include type contract updates across all packages for the diarized_json response format, comprehensive adapter logic for diarization detection and request/response handling, constraint validation to enforce diarization-specific rules, full test coverage, and extensive documentation.

Changes

OpenAI Transcription Diarization Feature

Layer / File(s) Summary
Response Format Type Contracts
packages/ai/src/types.ts, packages/ai/src/activities/generateTranscription/index.ts, packages/ai-client/src/generation-types.ts, packages/ai-openai/src/audio/transcription-provider-options.ts, docs/reference/interfaces/TranscriptionOptions.md
responseFormat union types are extended across core packages to include 'diarized_json' as a supported output format. OpenAITranscriptionProviderOptions adds optional chunking_strategy field supporting 'auto', VAD config, or null.
OpenAI Adapter Diarization Implementation
packages/ai-openai/src/adapters/transcription.ts
Adapter detects diarization-capable models, validates diarization options, maps requests to diarized_json when appropriate, auto-sets chunking_strategy: 'auto' for diarize model defaults, parses diarized segments into TranscriptionSegment[] with speaker/start/end/text, and preserves non-diarized handling paths.
Diarization Adapter Test Coverage
packages/ai-openai/tests/transcription-adapter.test.ts
Test suite verifies default diarization behavior (automatic diarized_json and chunking_strategy: 'auto'), explicit diarization configuration forwarding (server VAD, known speakers), chunking_strategy: null passthrough, alternative response formats on diarize model, and validation errors for unsupported options and speaker metadata constraints.
Documentation and Changeset
.changeset/openai-transcription-diarization.md, docs/media/transcription.md, docs/adapters/openai.md, docs/media/generation-hooks.md, docs/comparison/vercel-ai-sdk.md, docs/reference/interfaces/TranscriptionOptions.md, packages/ai/skills/ai-core/media-generation/SKILL.md
Version bump to minor across packages. Documentation expanded to include gpt-4o-transcribe-diarize with examples, diarized_json format, timestamp_granularities usage, diarization chunking_strategy guidance, and updated Whisper examples using responseFormat: 'verbose_json'.

Sequence Diagram

sequenceDiagram
  participant Adapter as OpenAI Adapter
  participant Validator as validateDiarizationOptions
  participant Mapper as mapResponseFormat
  participant OpenAI as OpenAI API
  participant Parser as Diarized Parser

  Adapter->>Adapter: Identify diarization-capable model
  Adapter->>Validator: Validate diarization options
  Validator-->>Adapter: Constraints enforced
  Adapter->>Mapper: Map responseFormat
  Mapper-->>Adapter: diarized_json selected or mapped format
  Adapter->>Adapter: Set chunking_strategy (auto/default/null)
  Adapter->>OpenAI: Create transcription request
  OpenAI-->>Adapter: Diarized or non-diarized response
  Adapter->>Parser: Map segments with speaker labels
  Parser-->>Adapter: TranscriptionSegment[]
  Adapter-->>Adapter: Return structured transcription result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • crutchcorn
  • AlemTuzlak
  • tombeckenham

Poem

🐰
Whispers split into many small tunes,
Voices labeled beneath silver moons,
Chunks pick up cadence, speakers align,
JSON returns each clear line,
A rabbit cheers: "Transcribe, refine!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately and concisely summarizes the main change: adding OpenAI diarization support to the transcription system.
Description check ✅ Passed The pull request description follows the template structure with all required sections completed, including changes, checklist items checked, and release impact clearly indicated.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-openai/src/adapters/transcription.ts`:
- Around line 267-285: The diarization validation is missing a local guard for
responseFormat: update validateDiarizationOptions (used by transcribe and
guarded by isDiarizeTranscriptionModel) to throw when
modelOptions.responseFormat (or the mapped value from mapResponseFormat) is not
one of the allowed values ["json","text","diarized_json"]; ensure transcribe()
cannot send srt/vtt/verbose_json for diarize models by checking
modelOptions.responseFormat (or resolved response format) early and throwing a
clear error stating diarization models only support json, text, and
diarized_json; reference validateDiarizationOptions, transcribe,
mapResponseFormat, and isDiarizeTranscriptionModel when applying the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c4b4b31-fb90-4e00-9d8f-1454f513e089

📥 Commits

Reviewing files that changed from the base of the PR and between 5634f18 and a59d368.

📒 Files selected for processing (13)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
  • packages/ai-client/src/generation-types.ts
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai-openai/src/audio/transcription-provider-options.ts
  • packages/ai-openai/tests/transcription-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai/src/activities/generateTranscription/index.ts
  • packages/ai/src/types.ts

Comment thread packages/ai-openai/src/adapters/transcription.ts
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant