Skip to content

feat(audio): add tracking for audio transcriptions in OpenAI client#400

Open
JasonLovesDoggo wants to merge 4 commits intoPostHog:masterfrom
JasonLovesDoggo:feat/support-transcribe
Open

feat(audio): add tracking for audio transcriptions in OpenAI client#400
JasonLovesDoggo wants to merge 4 commits intoPostHog:masterfrom
JasonLovesDoggo:feat/support-transcribe

Conversation

@JasonLovesDoggo
Copy link

@JasonLovesDoggo JasonLovesDoggo commented Jan 2, 2026

This adds support for tracking transcriptions from OpenAI. It does this via a new event $ai_transcription which follows the pattern of embeddings. I figure that audio -> text Is different enough from text-to-text to deserve its own event.

Confirmed it worked in my own testing. Feel free to impersonate and view https://us.posthog.com/project/254263/events/2e8ded5c-acd2-45b4-b10f-7a85a438ffaa/2026-01-02T15%3A02%3A00.007000-05%3A00

Copilot AI review requested due to automatic review settings January 2, 2026 19:47
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 2, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds tracking support for OpenAI audio transcriptions via a new $ai_transcription event. The implementation follows the existing pattern used for embeddings, treating audio-to-text as a distinct operation from text-to-text transformations.

  • Introduces WrappedAudio and WrappedTranscriptions classes for both sync and async OpenAI clients
  • Captures transcription metadata including model, input file name, output text, latency, and optional properties like language and audio duration
  • Supports privacy mode, groups, and custom properties consistent with other AI tracking features

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
posthog/ai/openai/openai.py Adds WrappedAudio and WrappedTranscriptions classes to track transcription usage in the sync OpenAI client
posthog/ai/openai/openai_async.py Adds async versions of WrappedAudio and WrappedTranscriptions to track transcription usage in the async OpenAI client
posthog/test/ai/openai/test_openai.py Adds comprehensive test coverage for transcription tracking including basic usage, duration tracking, language parameter, groups, privacy mode, and async support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JasonLovesDoggo
Copy link
Author

cc @andrewm4894

@rafaeelaudibert
Copy link
Member

We've updated our release process. We require sampo now. Please rebase on master and check README to understand what should be done.

@rafaeelaudibert rafaeelaudibert requested a review from a team February 19, 2026 03:20
@andrewm4894
Copy link
Member

@PostHog/team-llm-analytics this is an interesting one as it adds a new $ai_transcription event so maybe needs some thought/discussion

@JasonLovesDoggo
Copy link
Author

We've updated our release process. We require sampo now. Please rebase on master and check README to understand what should be done.

Gotcha, happy to add this once I get the goahead that this can move forward

@carlos-marchal-ph
Copy link
Contributor

Hi Jason! Carlos from the PostHog LLM analytics team here. Appreciate the PR, that's certainly a sensible way to go about it. Unfortunately I'm leaning towards shelving this for the moment. There's some internal reasons for it:

  • We are currently implementing multimodal support. The current architecture leans towards treating all input/output to LLMs as potentially multimodal, which also supports mixed content by default.
  • We are still discussing how we'd like to handle potentially large payloads in AI events. Audio might not be the worst offender here, but in line with the above we want a solution that can ingest large blobs (hi-res images, potentially even videos at some point). This might impact how we ingest them from the SDK side, so we don't want to commit to anything right now.
  • In line with the above, we are trying to centralise as much of the event post-processing as possible to our backend. This makes it easier to control for us, and also avoids having to duplicate features across the different SDKs.

In the meantime, you should be able to get this working without forking by writing a small helper that calls client.audio.transcriptions.create() on the standard OpenAI client and then uses posthog.capture() to send a custom event with the properties you care about (model, latency, etc.). We'll be sure to credit you if we end up using part/all of your code moving forward. And again, thanks a lot for contributing!

@JasonLovesDoggo
Copy link
Author

Hi Jason! Carlos from the PostHog LLM analytics team here. Appreciate the PR, that's certainly a sensible way to go about it. Unfortunately I'm leaning towards shelving this for the moment. There's some internal reasons for it:

  • We are currently implementing multimodal support. The current architecture leans towards treating all input/output to LLMs as potentially multimodal, which also supports mixed content by default.
  • We are still discussing how we'd like to handle potentially large payloads in AI events. Audio might not be the worst offender here, but in line with the above we want a solution that can ingest large blobs (hi-res images, potentially even videos at some point). This might impact how we ingest them from the SDK side, so we don't want to commit to anything right now.
  • In line with the above, we are trying to centralise as much of the event post-processing as possible to our backend. This makes it easier to control for us, and also avoids having to duplicate features across the different SDKs.

In the meantime, you should be able to get this working without forking by writing a small helper that calls client.audio.transcriptions.create() on the standard OpenAI client and then uses posthog.capture() to send a custom event with the properties you care about (model, latency, etc.). We'll be sure to credit you if we end up using part/all of your code moving forward. And again, thanks a lot for contributing!

All good, looking forward to this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants