-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Summary
OpenAI's GPT-4o models support audio input/output in chat completions. When streaming with audio output (modalities: ["text", "audio"]), the API sends delta.audio chunks containing audio.id, audio.transcript, audio.data (base64), and audio.expires_at. The current aggregateChatCompletionChunks function only handles delta.role, delta.content, delta.tool_calls, and delta.finish_reason — it does not aggregate delta.audio at all.
This means the audio transcript (the most useful field for observability) is lost in streaming responses. Non-streaming responses capture the full response object, so audio data is preserved there. Audio token metrics (prompt_audio_tokens, completion_audio_tokens) are already correctly extracted from usage data.
What is missing
- Streaming aggregation (
js/src/instrumentation/plugins/openai-plugin.ts,aggregateChatCompletionChunks~lines 389-466): Handlesdelta.role,delta.content,delta.tool_calls, anddelta.finish_reason. No branch fordelta.audio. Audio transcript chunks are silently dropped. - Vendor SDK types (
js/src/vendor-sdk-types/openai-common.ts):OpenAIChatDeltainterface definesrole,content,tool_calls,finish_reasonand a catch-all[key: string]: unknown. No explicitaudiofield. - E2E tests: No scenario tests audio modality in chat completions.
Expected behavior
The streaming aggregation should at minimum concatenate delta.audio.transcript chunks so the final span output includes the model's audio transcript. Storing the full base64 audio.data in spans would be impractical (very large), but the transcript text is compact and essential for observability.
Upstream reference
- OpenAI audio output in chat completions: https://platform.openai.com/docs/guides/audio
- Streaming audio delta fields:
audio.id,audio.data(base64 chunks),audio.transcript(text chunks),audio.expires_at - Supported on
gpt-4o-audio-previewandgpt-4o-mini-audio-previewmodels. - Audio token metrics are documented in the usage object as
prompt_tokens_details.audio_tokensandcompletion_tokens_details.audio_tokens.
Braintrust docs status
The Braintrust OpenAI integration page documents chat completions but does not mention audio modality (not_found).
What already works
Audio token metrics are properly extracted. In js/src/openai-utils.ts, the extractOpenAIMetrics function maps input_tokens_details.audio_tokens → prompt_audio_tokens and output_tokens_details.audio_tokens → completion_audio_tokens. This is confirmed by unit tests in openai-plugin.test.ts.
Local files inspected
js/src/instrumentation/plugins/openai-plugin.ts—aggregateChatCompletionChunksfunctionjs/src/vendor-sdk-types/openai-common.ts—OpenAIChatDeltainterfacejs/src/openai-utils.ts—extractOpenAIMetrics(audio token metrics)js/src/wrappers/oai.ts— wrapper proxye2e/scenarios/openai-instrumentation/scenario.impl.mjs— e2e test scenarios