Summary
/v1/listen
Problem to solve
We operate a paid platform that re-sells Deepgram pre-recorded transcription to end users on a per-minute basis. Our app accepts a media URL from an end user, forwards it to POST /v1/listen (async with callback), and bills the user only after the webhook returns metadata.duration.
The /v1/listen endpoint has no parameter that lets the API caller bound the maximum billable duration of a single request. The full OpenAPI spec at https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded lists callback, callback_method, extra, tag, sentiment, summarize, topics, custom_topic, custom_topic_mode, intents, custom_intent, custom_intent_mode, detect_entities, detect_language, diarize, dictation, encoding, filler_words, keyterm, keywords, language, measurements, model, multichannel, numerals, paragraphs, profanity_filter, punctuate, redact, replace, search, smart_format, utterances, utt_split, version, mip_opt_out — none of which constrains duration or billable seconds.
Because the caller cannot tell Deepgram "stop and bill at most N seconds for this submission", any caller who exposes the API indirectly to untrusted end users is exposed to an unbounded cost-amplification attack:
- The end user submits a media URL pointing to an arbitrarily long, perfectly valid audio file (e.g. a 1000-hour public-domain recording — no metadata forgery required).
- The platform forwards the URL to /v1/listen. There is no client-side way to know the true playable duration short of fully downloading and re-decoding the file, which is operationally prohibitive on every submission.
- Deepgram decodes the entire file and bills the platform for the full real duration.
- The end user's account at the platform may only carry a few minutes worth of credit; the platform absorbs the rest of the cost.
The caller can validate everything except what only Deepgram can know: the true number of billable seconds Deepgram will charge for this specific request. Deepgram learns the exact number during decoding; only Deepgram can fail the request before billable seconds accrue beyond a caller-specified cap.
This is the single largest unbounded-cost vector for any Deepgram customer who exposes the API indirectly to untrusted end users (re-sellers, white-label SaaS, marketplaces, customer-facing transcription UIs).
Proposed solution
Add an optional request-level cap on POST /v1/listen, e.g. `max_billable_seconds`:
- Type: integer (or float), in seconds.
- Default: absent (preserves current behavior — backward compatible).
- On excess: reject the request before any billable seconds accrue beyond the cap. Suggested response: 413 Payload Too Large, or a dedicated 400 with a documented error code such as "MAX_BILLABLE_SECONDS_EXCEEDED".
- Billing guarantee: the dollar charge for a single request must never exceed `max_billable_seconds × per-second rate`, regardless of the file's real duration.
Example:
POST /v1/listen?callback=...&max_billable_seconds=14400
{ "url": "https://example.com/audio.mp3" }
-> if decoded media > 14400s, request fails with no charge above 14400s; otherwise behaves identically to today.
A complementary feature would be an account-wide per-request cap configured from the Deepgram console, so every submission from a given API key is automatically capped without per-call wiring. This is arguably the better default for re-seller and B2B2C use cases.
Precedent in comparable usage-billed APIs:
- OpenAI: `max_tokens` / `max_completion_tokens` per request.
- AWS Transcribe: per-job duration limits surfaced via job-config errors.
- Google Cloud Speech-to-Text: long-running operation duration cap.
Alternatives considered
No response
Scope
All SDKs (parity)
Priority
Blocker
Extra context / links
No response
Session ID (optional)
No response
Project ID (optional)
No response
Request ID (optional)
No response
Summary
/v1/listen
Problem to solve
We operate a paid platform that re-sells Deepgram pre-recorded transcription to end users on a per-minute basis. Our app accepts a media URL from an end user, forwards it to POST /v1/listen (async with
callback), and bills the user only after the webhook returns metadata.duration.The /v1/listen endpoint has no parameter that lets the API caller bound the maximum billable duration of a single request. The full OpenAPI spec at https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded lists callback, callback_method, extra, tag, sentiment, summarize, topics, custom_topic, custom_topic_mode, intents, custom_intent, custom_intent_mode, detect_entities, detect_language, diarize, dictation, encoding, filler_words, keyterm, keywords, language, measurements, model, multichannel, numerals, paragraphs, profanity_filter, punctuate, redact, replace, search, smart_format, utterances, utt_split, version, mip_opt_out — none of which constrains duration or billable seconds.
Because the caller cannot tell Deepgram "stop and bill at most N seconds for this submission", any caller who exposes the API indirectly to untrusted end users is exposed to an unbounded cost-amplification attack:
The caller can validate everything except what only Deepgram can know: the true number of billable seconds Deepgram will charge for this specific request. Deepgram learns the exact number during decoding; only Deepgram can fail the request before billable seconds accrue beyond a caller-specified cap.
This is the single largest unbounded-cost vector for any Deepgram customer who exposes the API indirectly to untrusted end users (re-sellers, white-label SaaS, marketplaces, customer-facing transcription UIs).
Proposed solution
Alternatives considered
No response
Scope
All SDKs (parity)
Priority
Blocker
Extra context / links
No response
Session ID (optional)
No response
Project ID (optional)
No response
Request ID (optional)
No response