From 72a5b5ed2ee12ea77a16c26578cdb7b9f60f9ff1 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Fri, 12 Jun 2026 12:08:04 +0100 Subject: [PATCH 01/18] DEL-33165: Add Melia 1, split Models and Languages pages - Add new Models page at /speech-to-text/models covering Enhanced, Standard, and Melia 1 (DEL-33165) - Replace Languages and models page with Languages (slug unchanged) - Add Melia 1 multilingual subsections to Batch Input and Output pages - Add Models nav item above Languages (DEL-33183) - Update inbound links to moved/renamed anchors (DEL-33182) - Add melia and Arabic example word to custom-words.txt Co-Authored-By: Claude Fable 5 --- custom-words.txt | 2 + .../container/cpu-speech-to-text.mdx | 4 +- .../container/gpu-speech-to-text.mdx | 2 +- docs/deployments/index.md | 2 +- docs/speech-to-text/batch/input.mdx | 18 ++ docs/speech-to-text/batch/output.mdx | 50 ++++ .../features/audio-filtering.mdx | 2 +- .../features/feature-discovery.mdx | 4 +- docs/speech-to-text/languages.mdx | 274 ++++++++---------- docs/speech-to-text/models.mdx | 99 +++++++ docs/speech-to-text/sidebar.ts | 4 + 11 files changed, 304 insertions(+), 157 deletions(-) create mode 100644 docs/speech-to-text/models.mdx diff --git a/custom-words.txt b/custom-words.txt index 37d58984..c1dcbd8c 100644 --- a/custom-words.txt +++ b/custom-words.txt @@ -339,3 +339,5 @@ seqs vllm configmap sessiongroups +melia +مرحبا diff --git a/docs/deployments/container/cpu-speech-to-text.mdx b/docs/deployments/container/cpu-speech-to-text.mdx index 506c646e..7c9c79a0 100644 --- a/docs/deployments/container/cpu-speech-to-text.mdx +++ b/docs/deployments/container/cpu-speech-to-text.mdx @@ -262,11 +262,11 @@ In general, the format is: `{language}_{domain}_{processor}_{operating_point}:{p The parameters are: - `language` - One of the supported [language codes](/speech-to-text/languages) -- `domain` - One of `general` or a domain used for some [multi-lingual transcription](/speech-to-text/languages#multilingual-speech-to-text) use cases. For example: `SM_PREWARM_ENGINE_MODES='es_bilingual-en_gpu_standard:1'` +- `domain` - One of `general` or a domain used for some [multi-lingual transcription](/speech-to-text/languages#bilingual-and-multi-language-packs) use cases. For example: `SM_PREWARM_ENGINE_MODES='es_bilingual-en_gpu_standard:1'` - `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text) -- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/languages#models) you want to prewarm +- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/models) you want to prewarm - `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start. diff --git a/docs/deployments/container/gpu-speech-to-text.mdx b/docs/deployments/container/gpu-speech-to-text.mdx index 43b6b795..46d278f6 100644 --- a/docs/deployments/container/gpu-speech-to-text.mdx +++ b/docs/deployments/container/gpu-speech-to-text.mdx @@ -107,7 +107,7 @@ Once the GPU Server is running, follow the [Instructions for Linking a CPU Conta ### Running only one operating point -[Operating Points](/speech-to-text/languages#models) represent different levels of model complexity. +[Operating Points](/speech-to-text/models) represent different levels of model complexity. To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the `SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`. diff --git a/docs/deployments/index.md b/docs/deployments/index.md index 3d8f0de3..3f427f86 100644 --- a/docs/deployments/index.md +++ b/docs/deployments/index.md @@ -31,7 +31,7 @@ Feature availability varies depending on the deployment method you choose. Below | Feature | Modes | Deployments | | ------------------------------------------------------------------------------------- | --------------- | ------------- | -| [Multilingual speech to text](/speech-to-text/languages#multilingual-speech-to-text) | Batch, Realtime | SaaS, On-prem | +| [Multilingual speech to text](/speech-to-text/languages#bilingual-and-multi-language-packs) | Batch, Realtime | SaaS, On-prem | | [Alignment](/speech-to-text/batch/alignment) | Batch | SaaS | | [Audio events](/speech-to-text/features/audio-events) | Batch, Realtime | SaaS, On-prem | | [Audio filtering](/speech-to-text/features/audio-filtering) | Batch, Realtime | SaaS, On-prem | diff --git a/docs/speech-to-text/batch/input.mdx b/docs/speech-to-text/batch/input.mdx index cea277b1..15d2c974 100644 --- a/docs/speech-to-text/batch/input.mdx +++ b/docs/speech-to-text/batch/input.mdx @@ -40,6 +40,24 @@ Below are the complete fields of the configuration object: +### Multilingual transcription with Melia 1 + +The Melia 1 model transcribes audio containing more than one language. Select it with `"model": "melia-1"` and `"language": "multi"`; for basic model selection, refer to [Models](/speech-to-text/models). + +If you know which languages appear in the audio, provide them as language hints to reduce the chance of unexpected languages or scripts in the output. This config hints that the audio contains English and Arabic: + +```json +{ + "type": "transcription", + "transcription_config": { + "model": "melia-1", + "language": "multi", + "language_hints": ["en", "ar"] + } +} +``` + +Specify any number of [supported language codes](/speech-to-text/languages#transcription-languages). For a monolingual file, hinting the single language present can improve accuracy. ## Fetch URL diff --git a/docs/speech-to-text/batch/output.mdx b/docs/speech-to-text/batch/output.mdx index ed02f6c5..5f0e8532 100644 --- a/docs/speech-to-text/batch/output.mdx +++ b/docs/speech-to-text/batch/output.mdx @@ -131,6 +131,56 @@ The following is an example of a transcript response, which you should see as an {JSON.stringify(transcriptResponseExample, null, 2)} +### Multilingual transcript output + +For a Melia 1 job, the `language` property on each word reflects the language detected for that word, so it can change across the transcript. For Enhanced and Standard jobs, which transcribe one selected language, the same language is reported for every word. + +The example below shows two words in different languages within one transcript: + +```json +{ + "results": [ + { + "alternatives": [ + { "content": "Hello", "confidence": 0.98, "language": "en" } + ], + "start_time": 0.20, + "end_time": 0.52, + "type": "word" + }, + { + "alternatives": [ + { "content": "مرحبا", "confidence": 0.95, "language": "ar" } + ], + "start_time": 0.60, + "end_time": 1.04, + "type": "word" + } + ] +} +``` + +For multilingual transcripts, `language_pack_info` reports the word delimiter and writing direction per language rather than for a single language pack: + +```json +{ + "metadata": { + "language_pack_info": { + "per_language_word_delimiters": { + "en": " ", + "ar": " " + }, + "per_language_writing_direction": { + "en": "left-to-right", + "ar": "right-to-left" + } + } + } +} +``` + +`per_language_word_delimiters` gives the word delimiter for each language in the transcript, and `per_language_writing_direction` gives its writing direction. + ## Quicklinks diff --git a/docs/speech-to-text/features/audio-filtering.mdx b/docs/speech-to-text/features/audio-filtering.mdx index 87e2db4b..3374bd00 100644 --- a/docs/speech-to-text/features/audio-filtering.mdx +++ b/docs/speech-to-text/features/audio-filtering.mdx @@ -73,6 +73,6 @@ To obtain volume labelling without filtering any audio, supply an empty config o Once the audio is in a raw format (16kHz 16bit mono), it is split into 0.01s chunks. For each chunk, the root mean square amplitude of the signal is calculated, and scaled to the range `0 - 100`. If the volume is less than the supplied cut-off, the chunk will be replaced with silence. -To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [enhanced model](/speech-to-text/languages#operating-points), which is more robust against inadvertent damage to the audio. +To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [enhanced model](/speech-to-text/models), which is more robust against inadvertent damage to the audio. The word volume calculation takes the start and end times of words, and applies a weighted average of the volumes of each audio chunk which make up the word. The weighting attempts to ignore areas of silence within long words, and provide a better match with the volume classification a human listener would make. diff --git a/docs/speech-to-text/features/feature-discovery.mdx b/docs/speech-to-text/features/feature-discovery.mdx index f6fb3bfb..152d8159 100644 --- a/docs/speech-to-text/features/feature-discovery.mdx +++ b/docs/speech-to-text/features/feature-discovery.mdx @@ -18,11 +18,11 @@ curl "https://eu1.asr.api.speechmatics.com/v1/discovery/features" The feature discovery endpoint will include an object with the following properties: - `metadata` - - `language_pack_info` - For each of our [supported languages](/speech-to-text/languages), give the full name of the language, as well as any [Domain Language Optimizations](/speech-to-text/languages#multilingual-speech-to-text) or [Output Locales](/speech-to-text/formatting#output-locale) + - `language_pack_info` - For each of our [supported languages](/speech-to-text/languages), give the full name of the language, as well as any [Domain Language Optimizations](/speech-to-text/languages#bilingual-and-multi-language-packs) or [Output Locales](/speech-to-text/formatting#output-locale) - `batch` - Capabilities relating to our Batch API - `transcription` - Capabilities relating to transcription - `languages` - Includes a list of supported ISO language codes - `locales` - Includes any languages with a supported [Output Locale](/speech-to-text/formatting#output-locale) - - `domains` - Includes any languages with a supported [Domain Language Optimizations](/speech-to-text/languages#multilingual-speech-to-text) + - `domains` - Includes any languages with a supported [Domain Language Optimizations](/speech-to-text/languages#bilingual-and-multi-language-packs) - `translation` - Includes all [supported translation pairs](/speech-to-text/features/translation#languages) - `languageid` - List of languages supported by [Language Identification](/speech-to-text/batch/language-identification) diff --git a/docs/speech-to-text/languages.mdx b/docs/speech-to-text/languages.mdx index 4a5e1b94..e099cb43 100644 --- a/docs/speech-to-text/languages.mdx +++ b/docs/speech-to-text/languages.mdx @@ -1,5 +1,6 @@ --- -description: "Information about the wide array of languages Speechmatics supports transcription for" +title: Languages +description: See which languages Speechmatics supports for transcription and translation, including bilingual packs. keywords: [ speechmatics, @@ -27,201 +28,174 @@ keywords: ] --- -# Languages and models +# Languages +See which languages Speechmatics supports for transcription and translation. -### Models - -Choose between two accuracy models when configuring your transcription session: -- **Standard** — optimized for faster turnaround with strong accuracy. Recommended when speed and efficiency are your priorities -- **Enhanced** — our highest-accuracy model with strong turnaround times. Recommended when precision is critical, and especially for complex audio (e.g. noisy environments, varied accents) - -By default, the `standard` model is used. You can specify the `enhanced` model as a part of the transcription config. For example: -```json -{ - "type": "transcription", - "transcription_config": { - "language": "en", - // highlight-start - "model": "enhanced" - // highlight-end - } -} -``` +To choose a transcription model, refer to [Models](/speech-to-text/models). +The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. It does not support the `auto` option, the bilingual and multi-language packs, or translation. For Melia 1, refer to [Models](/speech-to-text/models). ## Transcription languages -:::info -To automatically identify the language in an audio file, use our [Language Identification](/speech-to-text/batch/language-identification) feature. - -To dynamically update your system with the latest languages and features offered by Speechmatics, use our [Feature Discovery](/speech-to-text/features/feature-discovery) endpoint. -::: - -Speechmatics supports the following languages. Your ability to use any or all of the languages will depend on what languages you are contracted to use. - -Speechmatics takes a global-first approach to our languages. In a single language pack, we aim to support many different accents and dialects. This simplifies your workflow when selecting which language to use, not requiring you to know which accent is being spoken in your audio upfront. With this approach we still achieve very high accuracy compared to accent-specific language packs. - -| Language | Language Code | Description | -| ---------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Automatic | auto | Automatically detect the language using our [Language Identification](/speech-to-text/batch/language-identification) feature.
Please note, this is currently only supported with Batch Transcriptions. | -| Arabic | ar | Our global Arabic gives high-accuracy transcription across many different accents and dialects including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt and the Levant. | -| Arabic & English bilingual | ar_en | Ideal when transcribing Arabic and English in the same media file or stream. Supports all accents and dialects listed under Arabic and English. | -| Bashkir | ba | | -| Basque | eu | | -| Belarusian | be | | -| Bengali | bn | | -| Bulgarian | bg | | -| Cantonese | yue | | -| Catalan | ca | | -| Croatian | hr | | -| Czech | cs | | -| Danish | da | | -| Dutch | nl | | -| English | en | Our global English gives high-accuracy transcription across many different accents including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand and non-native speakers. To standardise spelling, we recommend specifying the [Output Locale](/speech-to-text/formatting#output-locale). | -| Esperanto | eo | | -| Estonian | et | | -| Finnish | fi | | -| French | fr | Our global French gives high-accuracy transcription across many different accents including (but not limited to) French spoken in France, Canada and Belgium. | -| Galician | gl | | -| German | de | Our global German gives high-accuracy transcription across many different accents including (but not limited to) German spoken in Germany, Austria and Switzerland. | -| Greek | el | | -| Hebrew | he | | -| Hindi | hi | | -| Hungarian | hu | | -| Indonesian | id | | -| Interlingua | ia | | -| Irish | ga | | -| Italian | it | | -| Japanese | ja | | -| Korean | ko | | -| Latvian | lv | | -| Lithuanian | lt | | -| Malay | ms | | -| Malay & English bilingual | en_ms | Ideal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English. | -| Maltese | mt | | -| Mandarin | cmn | Our global Mandarin can output [Traditional or Simplified characters](/speech-to-text/formatting#output-locale) and gives high accuracy transcription across many different accents including (but not limited to) China, Taiwan, Singapore, Malaysia. | -| Mandarin & English bilingual | cmn_en | Ideal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English. | -| Mandarin Malay Tamil & English multilingual | cmn_en_ms_ta | Ideal when transcribing Mandarin, Malay, Tamil and English in the same media file or stream. Supports all accents and dialects listed under Mandarin, Malay, Tamil and English. | -| Marathi | mr | | -| Mongolian | mn | | -| Norwegian | no | | -| Persian | fa | | -| Polish | pl | | -| Portuguese | pt | Our global Portuguese gives high-accuracy transcription across many different accents including (but not limited to) Portuguese spoken in Portugal and Brazil. | -| Romanian | ro | | -| Russian | ru | | -| Slovakian | sk | | -| Slovenian | sl | | -| Spanish | es | Our global Spanish gives high-accuracy transcription across many different accents including (but not limited to) Spanish spoken in Spain, US, Mexico, Colombia, Argentina, Venezuela, Chile and Peru. | -| Spanish & English bilingual | es (with domain='bilingual-en') | Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. [Requires the domain config to be set](/speech-to-text/languages#multilingual-speech-to-text). | -| Swahili | sw | | -| Swedish | sv | | -| Tagalog (Filipino) & English bilingual | tl | Ideal when transcribing Tagalog (Filipino) and English in the same media file or stream. Supports all accents and dialects listed under English. | -| Tamil | ta | | -| Tamil & English bilingual | en_ta | Ideal when transcribing Tamil and English in the same media file or stream. Supports all accents and dialects listed under Tamil and English. | -| Thai | th | | -| Turkish | tr | | -| Ukrainian | uk | | -| Urdu | ur | | -| Uyghur | ug | | -| Vietnamese | vi | | -| Welsh | cy | Welsh must be explicitly added to the [expected languages](/speech-to-text/batch/language-identification#expected-languages) list when using our Language Identification feature, otherwise a [language not supported for transcription error](/speech-to-text/batch/language-identification#language-not-supported-for-transcription) will be returned. | - -Each language above is uniquely identified by a two-letter code (ISO639-1) or three-letter code (ISO639-3) in API requests and responses. -## Translation languages +To automatically identify the language in an audio file, use the [Language Identification](/speech-to-text/batch/language-identification) feature. + +To dynamically update your system with the latest languages and features offered by Speechmatics, use the [Feature Discovery](/speech-to-text/features/feature-discovery) endpoint. + +Speechmatics supports the following languages. Your ability to use any or all of them depends on the languages you are contracted to use. + +Speechmatics takes a global-first approach to languages. A single language pack supports many accents and dialects, so you do not need to know which accent is in your audio before selecting a language. This approach achieves high accuracy compared to accent-specific language packs. + +| Language | Language code | Description | +|---|---|---| +| Automatic | auto | Automatically detect the language using the [Language Identification](/speech-to-text/batch/language-identification) feature. Currently supported with Batch transcription only. | +| Arabic | ar | Global Arabic gives high-accuracy transcription across many accents and dialects, including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt, and the Levant. | +| Arabic & English bilingual | ar_en | Ideal when transcribing Arabic and English in the same media file or stream. Supports all accents and dialects listed under Arabic and English. | +| Bashkir | ba | | +| Basque | eu | | +| Belarusian | be | | +| Bengali | bn | | +| Bulgarian | bg | | +| Cantonese | yue | | +| Catalan | ca | | +| Croatian | hr | | +| Czech | cs | | +| Danish | da | | +| Dutch | nl | | +| English | en | Global English gives high-accuracy transcription across many accents, including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand, and by non-native speakers. To standardize spelling, specify the [output locale](/speech-to-text/formatting#output-locale). | +| Esperanto | eo | | +| Estonian | et | | +| Finnish | fi | | +| French | fr | Global French gives high-accuracy transcription across many accents, including (but not limited to) French spoken in France, Canada, and Belgium. | +| Galician | gl | | +| German | de | Global German gives high-accuracy transcription across many accents, including (but not limited to) German spoken in Germany, Austria, and Switzerland. | +| Greek | el | | +| Hebrew | he | | +| Hindi | hi | | +| Hungarian | hu | | +| Indonesian | id | | +| Interlingua | ia | | +| Irish | ga | | +| Italian | it | | +| Japanese | ja | | +| Korean | ko | | +| Latvian | lv | | +| Lithuanian | lt | | +| Malay | ms | | +| Malay & English bilingual | en_ms | Ideal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English. | +| Maltese | mt | | +| Mandarin | cmn | Global Mandarin can output [Traditional or Simplified characters](/speech-to-text/formatting#output-locale) and gives high-accuracy transcription across many accents, including (but not limited to) China, Taiwan, Singapore, and Malaysia. | +| Mandarin & English bilingual | cmn_en | Ideal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English. | +| Mandarin Malay Tamil & English | cmn_en_ms_ta | Ideal when transcribing Mandarin, Malay, Tamil, and English in the same media file or stream. Supports all accents and dialects listed under Mandarin, Malay, Tamil, and English. | +| Marathi | mr | | +| Mongolian | mn | | +| Norwegian | no | | +| Persian | fa | | +| Polish | pl | | +| Portuguese | pt | Global Portuguese gives high-accuracy transcription across many accents, including (but not limited to) Portuguese spoken in Portugal and Brazil. | +| Romanian | ro | | +| Russian | ru | | +| Slovakian | sk | | +| Slovenian | sl | | +| Spanish | es | Global Spanish gives high-accuracy transcription across many accents, including (but not limited to) Spanish spoken in Spain, the US, Mexico, Colombia, Argentina, Venezuela, Chile, and Peru. | +| Spanish & English bilingual | es (with domain='bilingual-en') | Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. [Requires the domain config to be set](/speech-to-text/languages#bilingual-and-multi-language-packs). | +| Swahili | sw | | +| Swedish | sv | | +| Tagalog (Filipino) & English bilingual | tl | Ideal when transcribing Tagalog (Filipino) and English in the same media file or stream. Supports all accents and dialects listed under English. | +| Tamil | ta | | +| Tamil & English bilingual | en_ta | Ideal when transcribing Tamil and English in the same media file or stream. Supports all accents and dialects listed under Tamil and English. | +| Thai | th | | +| Turkish | tr | | +| Ukrainian | uk | | +| Urdu | ur | | +| Uyghur | ug | | +| Vietnamese | vi | | +| Welsh | cy | Welsh must be explicitly added to the [expected languages](/speech-to-text/batch/language-identification#expected-languages) list when using the Language Identification feature. Otherwise a [language not supported for transcription error](/speech-to-text/batch/language-identification#language-not-supported-for-transcription) is returned. | + +Each language is uniquely identified by a two-letter code (ISO 639-1) or three-letter code (ISO 639-3) in API requests and responses. -Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below. -For more details, see [Translation](/speech-to-text/features/translation). +## Translation languages -| Audio Language | Translation Target Language | -| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| English (en) | Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | -| Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | English (en) | -| Norwegian Bokmål (no) | Norwegian Nynorsk (nn) | +Translation is available with the Enhanced and Standard models. It is supported for most Speechmatics languages, with the supported translation pairs listed below. For more details, see [Translation](/speech-to-text/features/translation). +| Audio language | Translation target language | +|---|---| +| English (en) | Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | +| Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | English (en) | +| Norwegian Bokmål (no) | Norwegian Nynorsk (nn) | -## Multilingual speech-to-text +## Bilingual and multi-language packs -These packs are ideal when transcribing multiple languages in the same media file or stream with high accuracy. For more information on the supported languages, please refer to [Supported Language Packs](/speech-to-text/languages). +The Enhanced and Standard models can transcribe a selected combination of languages in one media file or stream, including speakers who switch between the languages in that pack. Each pack covers a fixed set of languages that you select with the `language` property. -Supported multilingual packs are: +Supported packs are: -| Language Pack | Transcription config | -|----------------|----------------------| -| Arabic and English | `{"language": "ar_en"}` | -| Malay and English | `{"language": "en_ms"}` | -| Mandarin and English | `{"language": "cmn_en"}` | -| Mandarin Malay Tamil and English | `{"language": "cmn_en_ms_ta"}` | -| Spanish and English | `{"language": "es", "domain": "bilingual-en"}` | -| Tamil and English | `{"language": "en_ta"}` | -| Tagalog (Filipino) and English | `{"language": "tl"}` | +| Language pack | Transcription config | +|---|---| +| Arabic and English | `{"language": "ar_en"}` | +| Malay and English | `{"language": "en_ms"}` | +| Mandarin and English | `{"language": "cmn_en"}` | +| Mandarin Malay Tamil and English | `{"language": "cmn_en_ms_ta"}` | +| Spanish and English | `{"language": "es", "domain": "bilingual-en"}` | +| Tamil and English | `{"language": "en_ta"}` | +| Tagalog (Filipino) and English | `{"language": "tl"}` | -Bilingual (excluding Spanish and English) example: +This config selects the Mandarin and English pack: ```json { "type": "transcription", "transcription_config": { - // highlight-start "language": "cmn_en" - // highlight-end } } ``` -Bilingual Spanish and English example: +This config selects the Spanish and English pack, which requires the `domain` property: ```json { "type": "transcription", "transcription_config": { "language": "es", - // highlight-start "domain": "bilingual-en" - // highlight-end } } ``` +These packs handle a fixed set of languages that you select in advance. To transcribe audio without selecting languages, including spontaneous switching across all supported languages, use the Melia 1 multilingual model. Refer to [Models](/speech-to-text/models). + +## Healthcare domain + +Speechmatics offers a medical domain that provides high accuracy for healthcare use cases such as ambient scribes and dictation tools. The medical domain is available with the Enhanced model only. It does not apply to the Standard or Melia 1 models. -## Healthcare transcription -Speechmatics offers domain-specific medical transcription models which provide unparallelled accuracy for medical use cases such as ambient scribes and dictation tools. +The medical domain is kept up to date using officially maintained data sources. This improves recognition of medical terminology such as procedures, medications, conditions, and anatomy. -These models are kept up to date using officially maintained data sources. This brings significant improvements in recognition of medical terminology such as names of procedures, medications, conditions, and anatomy. +For languages without medical domain support, the Enhanced model still gives high accuracy in the healthcare domain. -Note that for languages without a medical transcription model, Speechmatics still offers industry-leading accuracy in the healthcare domain when using the general purpose `enhanced` [model](#models). +Set the `domain` property to `medical`: -:::info -The medical domain-specific model must be used with the `enhanced` model. -::: - -Medical domain example: ```json { "type": "transcription", "transcription_config": { - "language": "en", - // highlight-start "model": "enhanced", + "language": "en", "domain": "medical" - // highlight-end } } ``` - - -|Language| Realtime | Batch| -|----------------|--|-| -|Arabic English| Available | Available | -|Danish| Available | Available | -|Dutch| Available | Available | -|English| Available | Available | -|Finnish| Available | Available | -|French| Available | Available | -|German| Available | Available | -|Norwegian| Available | Available | -|Spanish| Available | Available | -|Swedish| Available | Available | -|Additional languages| [Contact us for more information](https://www.speechmatics.com/speak-to-sales)| +| Language | Realtime | Batch | +|---|---|---| +| Arabic English | Available | Available | +| Danish | Available | Available | +| Dutch | Available | Available | +| English | Available | Available | +| Finnish | Available | Available | +| French | Available | Available | +| German | Available | Available | +| Norwegian | Available | Available | +| Spanish | Available | Available | +| Swedish | Available | Available | +| Additional languages | [Contact us](https://www.speechmatics.com/speak-to-sales) | | diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx new file mode 100644 index 00000000..c808502b --- /dev/null +++ b/docs/speech-to-text/models.mdx @@ -0,0 +1,99 @@ +--- +title: Models +description: Compare the Enhanced, Standard, and Melia 1 models and choose the right one for your audio. +--- + +# Models + +Compare the Speech to Text models and choose the right one for your audio. + +Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Melia 1. All three use the same API. The model you choose determines accuracy, how multilingual audio is handled, and which processing modes and regions are available. + +## Compare the models + +| Capability | Enhanced | Standard | Melia 1 | +|---|---|---|---| +| `model` value | `enhanced` | `standard` | `melia-1` | +| Accuracy | Highest | High | High | +| Turnaround | Fast | Fastest | Fast | +| Processing modes | Batch and Realtime | Batch and Realtime | Batch | +| Regions | EU, US, AUS | EU, US, AUS | EU, US | +| Language handling | Selected language or pack | Selected language or pack | Automatic multilingual | +| Custom dictionary | ✅ | ✅ | Not yet | +| Confidence scores | ✅ | ✅ | Not yet | +| Speech intelligence | ✅ | ✅ | Not yet | + +Enhanced and Standard are feature-identical and differ only in accuracy and speed: Enhanced delivers the highest accuracy, and Standard prioritizes throughput. Melia 1 matches Standard for accuracy and adds automatic multilingual transcription, but it is available for Batch only and supports a reduced feature set. Speech intelligence covers translation, summarization, topic detection, chapters, sentiment, and audio events. + +## Choose a model + +Use Enhanced for the highest accuracy on single-language audio, such as medical, legal, or subtitling work. + +Use Standard when throughput, cost, or latency matter more than the last increment of accuracy, such as archival transcription, content indexing, or large-scale captioning. + +Use Melia 1 for audio that contains more than one language, including speakers who switch language mid-conversation. It offers fast turnaround and accuracy on par with Standard. + +## Specify a model + +Set the `model` property in your transcription config. If you do not set it, the `standard` model is used. + +This config selects the `enhanced` model: + +```json +{ + "type": "transcription", + "transcription_config": { + "model": "enhanced", + "language": "en" + } +} +``` + +Enhanced and Standard are available for Realtime and Batch transcription. Melia 1 is available for Batch transcription only. + +## Melia 1 + +Melia 1 is a multilingual model. It transcribes audio that contains more than one language, including speakers who switch language mid-conversation, and returns a single continuous transcript. It does not require you to select a language pack, and its accuracy is on par with the Standard model. + +Set `"model": "melia-1"` and `"language": "multi"`: + +```json +{ + "type": "transcription", + "transcription_config": { + "model": "melia-1", + "language": "multi" + } +} +``` + +:::warning +Melia 1 requires `language` to be set to `multi`. Any other value returns an error. +::: + +Melia 1 is available for Batch transcription in the EU and US regions only. It is not available in the Australia (AU1) region. + +| Region | Endpoint | +|---|---| +| EU1 (Europe) | `eu1.asr.api.speechmatics.com` | +| US1 (USA) | `us1.asr.api.speechmatics.com` | + +For the full list of Batch endpoints, refer to [Authentication](/get-started/authentication#supported-endpoints). + +Melia 1 matches the Enhanced and Standard models for core transcription features, including diarization, word timings, punctuation, notifications, and output locale. It does not yet support the following features, which are available with the Enhanced and Standard models: + +- Custom vocabulary and formatting: custom dictionary, find and replace, spoken form output, profanity tagging +- Output detail: confidence scores, entity detection, audio filtering +- Speech intelligence: audio events, translation, summarization, chapters, topics, sentiment + +Melia 1 is an early-access model and its feature support is expanding. Check the [release notes](https://speechmatics.featurebase.app/en/changelog) for the latest. + +To configure language hints and read the per-language output metadata, refer to [Input](/speech-to-text/batch/input) and [Output](/speech-to-text/batch/output). + +## Operating points + +The `model` property replaces the `operating_point` property. Existing configs that use `operating_point` continue to transcribe without changes. + +:::note +In SaaS (cloud) deployments, `operating_point` is deprecated. It maps to `model` and accepts the same `enhanced` and `standard` values. Use [`model`](#specify-a-model) going forward. +::: diff --git a/docs/speech-to-text/sidebar.ts b/docs/speech-to-text/sidebar.ts index 8f553811..c5be0cb6 100644 --- a/docs/speech-to-text/sidebar.ts +++ b/docs/speech-to-text/sidebar.ts @@ -14,6 +14,10 @@ export default { }, realtimeSidebar, batchSidebar, + { + type: "doc", + id: "speech-to-text/models", + }, { type: "doc", id: "speech-to-text/languages", From 48183e2906c40f0d41684cce10dcb0f229268a4a Mon Sep 17 00:00:00 2001 From: Matt Nemitz Date: Fri, 12 Jun 2026 12:10:45 +0100 Subject: [PATCH 02/18] Split "Languages and models" into separate pages, update `operating_point` references --- .../features/audio-filtering.mdx | 2 +- docs/speech-to-text/languages.mdx | 299 +++++++----------- docs/speech-to-text/models.mdx | 103 ++++++ docs/speech-to-text/sidebar.ts | 4 + 4 files changed, 231 insertions(+), 177 deletions(-) create mode 100644 docs/speech-to-text/models.mdx diff --git a/docs/speech-to-text/features/audio-filtering.mdx b/docs/speech-to-text/features/audio-filtering.mdx index 87e2db4b..8cc184d1 100644 --- a/docs/speech-to-text/features/audio-filtering.mdx +++ b/docs/speech-to-text/features/audio-filtering.mdx @@ -73,6 +73,6 @@ To obtain volume labelling without filtering any audio, supply an empty config o Once the audio is in a raw format (16kHz 16bit mono), it is split into 0.01s chunks. For each chunk, the root mean square amplitude of the signal is calculated, and scaled to the range `0 - 100`. If the volume is less than the supplied cut-off, the chunk will be replaced with silence. -To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [enhanced model](/speech-to-text/languages#operating-points), which is more robust against inadvertent damage to the audio. +To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [enhanced model](/speech-to-text/models#pecify-a-model), which is more robust against inadvertent damage to the audio. The word volume calculation takes the start and end times of words, and applies a weighted average of the volumes of each audio chunk which make up the word. The weighting attempts to ignore areas of silence within long words, and provide a better match with the volume classification a human listener would make. diff --git a/docs/speech-to-text/languages.mdx b/docs/speech-to-text/languages.mdx index 4a5e1b94..c879f4c5 100644 --- a/docs/speech-to-text/languages.mdx +++ b/docs/speech-to-text/languages.mdx @@ -1,227 +1,174 @@ --- -description: "Information about the wide array of languages Speechmatics supports transcription for" -keywords: - [ - speechmatics, - languages, - transcription, - speech recognition, - asr, - en-us, - en-gb, - en-nz, - en-au, - fr-ca, - fr-be, - de-at, - de-ch, - es-mx, - es-cl, - es-ve, - es-pr, - es-ar, - pt-br, - model, - standard, - enhanced, - ] +title: Languages +description: See which languages Speechmatics supports for transcription and translation, including bilingual packs. --- -# Languages and models +# Languages +To choose a transcription model, refer to [Models](https://docs.speechmatics.com/speech-to-text/models). -### Models - -Choose between two accuracy models when configuring your transcription session: -- **Standard** — optimized for faster turnaround with strong accuracy. Recommended when speed and efficiency are your priorities -- **Enhanced** — our highest-accuracy model with strong turnaround times. Recommended when precision is critical, and especially for complex audio (e.g. noisy environments, varied accents) - -By default, the `standard` model is used. You can specify the `enhanced` model as a part of the transcription config. For example: -```json -{ - "type": "transcription", - "transcription_config": { - "language": "en", - // highlight-start - "model": "enhanced" - // highlight-end - } -} -``` - +The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. It does not support the `auto` option, the bilingual and multi-language packs, or translation. For Melia 1, refer to [Models](https://docs.speechmatics.com/speech-to-text/models). ## Transcription languages -:::info -To automatically identify the language in an audio file, use our [Language Identification](/speech-to-text/batch/language-identification) feature. - -To dynamically update your system with the latest languages and features offered by Speechmatics, use our [Feature Discovery](/speech-to-text/features/feature-discovery) endpoint. -::: - -Speechmatics supports the following languages. Your ability to use any or all of the languages will depend on what languages you are contracted to use. - -Speechmatics takes a global-first approach to our languages. In a single language pack, we aim to support many different accents and dialects. This simplifies your workflow when selecting which language to use, not requiring you to know which accent is being spoken in your audio upfront. With this approach we still achieve very high accuracy compared to accent-specific language packs. - -| Language | Language Code | Description | -| ---------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Automatic | auto | Automatically detect the language using our [Language Identification](/speech-to-text/batch/language-identification) feature.
Please note, this is currently only supported with Batch Transcriptions. | -| Arabic | ar | Our global Arabic gives high-accuracy transcription across many different accents and dialects including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt and the Levant. | -| Arabic & English bilingual | ar_en | Ideal when transcribing Arabic and English in the same media file or stream. Supports all accents and dialects listed under Arabic and English. | -| Bashkir | ba | | -| Basque | eu | | -| Belarusian | be | | -| Bengali | bn | | -| Bulgarian | bg | | -| Cantonese | yue | | -| Catalan | ca | | -| Croatian | hr | | -| Czech | cs | | -| Danish | da | | -| Dutch | nl | | -| English | en | Our global English gives high-accuracy transcription across many different accents including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand and non-native speakers. To standardise spelling, we recommend specifying the [Output Locale](/speech-to-text/formatting#output-locale). | -| Esperanto | eo | | -| Estonian | et | | -| Finnish | fi | | -| French | fr | Our global French gives high-accuracy transcription across many different accents including (but not limited to) French spoken in France, Canada and Belgium. | -| Galician | gl | | -| German | de | Our global German gives high-accuracy transcription across many different accents including (but not limited to) German spoken in Germany, Austria and Switzerland. | -| Greek | el | | -| Hebrew | he | | -| Hindi | hi | | -| Hungarian | hu | | -| Indonesian | id | | -| Interlingua | ia | | -| Irish | ga | | -| Italian | it | | -| Japanese | ja | | -| Korean | ko | | -| Latvian | lv | | -| Lithuanian | lt | | -| Malay | ms | | -| Malay & English bilingual | en_ms | Ideal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English. | -| Maltese | mt | | -| Mandarin | cmn | Our global Mandarin can output [Traditional or Simplified characters](/speech-to-text/formatting#output-locale) and gives high accuracy transcription across many different accents including (but not limited to) China, Taiwan, Singapore, Malaysia. | -| Mandarin & English bilingual | cmn_en | Ideal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English. | -| Mandarin Malay Tamil & English multilingual | cmn_en_ms_ta | Ideal when transcribing Mandarin, Malay, Tamil and English in the same media file or stream. Supports all accents and dialects listed under Mandarin, Malay, Tamil and English. | -| Marathi | mr | | -| Mongolian | mn | | -| Norwegian | no | | -| Persian | fa | | -| Polish | pl | | -| Portuguese | pt | Our global Portuguese gives high-accuracy transcription across many different accents including (but not limited to) Portuguese spoken in Portugal and Brazil. | -| Romanian | ro | | -| Russian | ru | | -| Slovakian | sk | | -| Slovenian | sl | | -| Spanish | es | Our global Spanish gives high-accuracy transcription across many different accents including (but not limited to) Spanish spoken in Spain, US, Mexico, Colombia, Argentina, Venezuela, Chile and Peru. | -| Spanish & English bilingual | es (with domain='bilingual-en') | Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. [Requires the domain config to be set](/speech-to-text/languages#multilingual-speech-to-text). | -| Swahili | sw | | -| Swedish | sv | | -| Tagalog (Filipino) & English bilingual | tl | Ideal when transcribing Tagalog (Filipino) and English in the same media file or stream. Supports all accents and dialects listed under English. | -| Tamil | ta | | -| Tamil & English bilingual | en_ta | Ideal when transcribing Tamil and English in the same media file or stream. Supports all accents and dialects listed under Tamil and English. | -| Thai | th | | -| Turkish | tr | | -| Ukrainian | uk | | -| Urdu | ur | | -| Uyghur | ug | | -| Vietnamese | vi | | -| Welsh | cy | Welsh must be explicitly added to the [expected languages](/speech-to-text/batch/language-identification#expected-languages) list when using our Language Identification feature, otherwise a [language not supported for transcription error](/speech-to-text/batch/language-identification#language-not-supported-for-transcription) will be returned. | - -Each language above is uniquely identified by a two-letter code (ISO639-1) or three-letter code (ISO639-3) in API requests and responses. -## Translation languages +To automatically identify the language in an audio file, use the [Language Identification](https://docs.speechmatics.com/speech-to-text/batch/language-identification) feature. + +To dynamically update your system with the latest languages and features offered by Speechmatics, use the [Feature Discovery](https://docs.speechmatics.com/speech-to-text/features/feature-discovery) endpoint. + +Speechmatics supports the following languages. Your ability to use any or all of them depends on the languages you are contracted to use. + +Speechmatics takes a global-first approach to languages. A single language pack supports many accents and dialects, so you do not need to know which accent is in your audio before selecting a language. This approach achieves high accuracy compared to accent-specific language packs. + +| Language | Language code | Description | +|---|---|---| +| Automatic | `auto` | Automatically detect the language using the [Language Identification](https://docs.speechmatics.com/speech-to-text/batch/language-identification) feature. Currently supported with Batch transcription only. | +| Arabic | `ar` | Global Arabic gives high-accuracy transcription across many accents and dialects, including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt, and the Levant. | +| Arabic & English bilingual | `ar_en` | Ideal when transcribing Arabic and English in the same media file or stream. Supports all accents and dialects listed under Arabic and English. | +| Bashkir | `ba` | | +| Basque | `eu` | | +| Belarusian | `be` | | +| Bengali | `bn` | | +| Bulgarian | `bg` | | +| Cantonese | `yue` | | +| Catalan | `ca` | | +| Croatian | `hr` | | +| Czech | `cs` | | +| Danish | `da` | | +| Dutch | `nl` | | +| English | `en` | Global English gives high-accuracy transcription across many accents, including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand, and by non-native speakers. To standardize spelling, specify the [output locale](https://docs.speechmatics.com/speech-to-text/formatting#output-locale). | +| Esperanto | `eo` | | +| Estonian | `et` | | +| Finnish | `fi` | | +| French | `fr` | Global French gives high-accuracy transcription across many accents, including (but not limited to) French spoken in France, Canada, and Belgium. | +| Galician | `gl` | | +| German | `de` | Global German gives high-accuracy transcription across many accents, including (but not limited to) German spoken in Germany, Austria, and Switzerland. | +| Greek | `el` | | +| Hebrew | `he` | | +| Hindi | `hi` | | +| Hungarian | `hu` | | +| Indonesian | `id` | | +| Interlingua | `ia` | | +| Irish | `ga` | | +| Italian | `it` | | +| Japanese | `ja` | | +| Korean | `ko` | | +| Latvian | `lv` | | +| Lithuanian | `lt` | | +| Malay | `ms` | | +| Malay & English bilingual | `en_ms` | Ideal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English. | +| Maltese | `mt` | | +| Mandarin | `cmn` | Global Mandarin can output [Traditional or Simplified characters](https://docs.speechmatics.com/speech-to-text/formatting#output-locale) and gives high-accuracy transcription across many accents, including (but not limited to) China, Taiwan, Singapore, and Malaysia. | +| Mandarin & English bilingual | `cmn_en` | Ideal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English. | +| Mandarin Malay Tamil & English | `cmn_en_ms_ta` | Ideal when transcribing Mandarin, Malay, Tamil, and English in the same media file or stream. Supports all accents and dialects listed under Mandarin, Malay, Tamil, and English. | +| Marathi | `mr` | | +| Mongolian | `mn` | | +| Norwegian | `no` | | +| Persian | `fa` | | +| Polish | `pl` | | +| Portuguese | `pt` | Global Portuguese gives high-accuracy transcription across many accents, including (but not limited to) Portuguese spoken in Portugal and Brazil. | +| Romanian | `ro` | | +| Russian | `ru` | | +| Slovakian | `sk` | | +| Slovenian | `sl` | | +| Spanish | `es` | Global Spanish gives high-accuracy transcription across many accents, including (but not limited to) Spanish spoken in Spain, the US, Mexico, Colombia, Argentina, Venezuela, Chile, and Peru. | +| Spanish & English bilingual | `es` (with domain=`bilingual-en`) | Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. [Requires the domain config to be set](https://docs.speechmatics.com/speech-to-text/languages#bilingual-and-multi-language-packs). | +| Swahili | `sw` | | +| Swedish | `sv` | | +| Tagalog (Filipino) & English bilingual | `tl` | Ideal when transcribing Tagalog (Filipino) and English in the same media file or stream. Supports all accents and dialects listed under English. | +| Tamil | `ta` | | +| Tamil & English bilingual | `en_ta` | Ideal when transcribing Tamil and English in the same media file or stream. Supports all accents and dialects listed under Tamil and English. | +| Thai | `th` | | +| Turkish | `tr` | | +| Ukrainian | `uk` | | +| Urdu | `ur` | | +| Uyghur | `ug` | | +| Vietnamese | `vi` | | +| Welsh | `cy` | Welsh must be explicitly added to the [expected languages](https://docs.speechmatics.com/speech-to-text/batch/language-identification#expected-languages) list when using the Language Identification feature. Otherwise a [language not supported for transcription error](https://docs.speechmatics.com/speech-to-text/batch/language-identification#language-not-supported-for-transcription) is returned. | + +Each language is uniquely identified by a two-letter code (ISO 639-1) or three-letter code (ISO 639-3) in API requests and responses. -Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below. -For more details, see [Translation](/speech-to-text/features/translation). +## Translation languages -| Audio Language | Translation Target Language | -| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| English (en) | Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | -| Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | English (en) | -| Norwegian Bokmål (no) | Norwegian Nynorsk (nn) | +Translation is available with the Enhanced and Standard models. It is supported for most Speechmatics languages, with the supported translation pairs listed below. For more details, see [Translation](https://docs.speechmatics.com/speech-to-text/features/translation). +| Audio language | Translation target language | +|---|---| +| English (en) | Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | +| Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | English (en) | +| Norwegian Bokmål (no) | Norwegian Nynorsk (nn) | -## Multilingual speech-to-text +## Bilingual and multi-language packs -These packs are ideal when transcribing multiple languages in the same media file or stream with high accuracy. For more information on the supported languages, please refer to [Supported Language Packs](/speech-to-text/languages). +The Enhanced and Standard models can transcribe a selected combination of languages in one media file or stream, including speakers who switch between the languages in that pack. Each pack covers a fixed set of languages that you select with the `language` property. -Supported multilingual packs are: +Supported packs are: -| Language Pack | Transcription config | -|----------------|----------------------| -| Arabic and English | `{"language": "ar_en"}` | -| Malay and English | `{"language": "en_ms"}` | -| Mandarin and English | `{"language": "cmn_en"}` | -| Mandarin Malay Tamil and English | `{"language": "cmn_en_ms_ta"}` | -| Spanish and English | `{"language": "es", "domain": "bilingual-en"}` | -| Tamil and English | `{"language": "en_ta"}` | -| Tagalog (Filipino) and English | `{"language": "tl"}` | +| Language pack | Transcription config | +|---|---| +| Arabic and English | `{"language": "ar_en"}` | +| Malay and English | `{"language": "en_ms"}` | +| Mandarin and English | `{"language": "cmn_en"}` | +| Mandarin Malay Tamil and English | `{"language": "cmn_en_ms_ta"}` | +| Spanish and English | `{"language": "es", "domain": "bilingual-en"}` | +| Tamil and English | `{"language": "en_ta"}` | +| Tagalog (Filipino) and English | `{"language": "tl"}` | -Bilingual (excluding Spanish and English) example: +This config selects the Mandarin and English pack: ```json { "type": "transcription", "transcription_config": { - // highlight-start "language": "cmn_en" - // highlight-end } } ``` -Bilingual Spanish and English example: +This config selects the Spanish and English pack, which requires the `domain` property: ```json { "type": "transcription", "transcription_config": { "language": "es", - // highlight-start "domain": "bilingual-en" - // highlight-end } } ``` +These packs handle a fixed set of languages that you select in advance. To transcribe audio without selecting languages, including spontaneous switching across all supported languages, use the Melia 1 multilingual model. Refer to [Models](https://docs.speechmatics.com/speech-to-text/models). + +## Healthcare domain + +Speechmatics offers a medical domain that provides high accuracy for healthcare use cases such as ambient scribes and dictation tools. The medical domain is available with the Enhanced model only. It does not apply to the Standard or Melia 1 models. -## Healthcare transcription -Speechmatics offers domain-specific medical transcription models which provide unparallelled accuracy for medical use cases such as ambient scribes and dictation tools. +The medical domain is kept up to date using officially maintained data sources. This improves recognition of medical terminology such as procedures, medications, conditions, and anatomy. -These models are kept up to date using officially maintained data sources. This brings significant improvements in recognition of medical terminology such as names of procedures, medications, conditions, and anatomy. +For languages without medical domain support, the Enhanced model still gives high accuracy in the healthcare domain. -Note that for languages without a medical transcription model, Speechmatics still offers industry-leading accuracy in the healthcare domain when using the general purpose `enhanced` [model](#models). +Set the `domain` property to `medical`: -:::info -The medical domain-specific model must be used with the `enhanced` model. -::: - -Medical domain example: ```json { "type": "transcription", "transcription_config": { - "language": "en", - // highlight-start "model": "enhanced", + "language": "en", "domain": "medical" - // highlight-end } } ``` - - -|Language| Realtime | Batch| -|----------------|--|-| -|Arabic English| Available | Available | -|Danish| Available | Available | -|Dutch| Available | Available | -|English| Available | Available | -|Finnish| Available | Available | -|French| Available | Available | -|German| Available | Available | -|Norwegian| Available | Available | -|Spanish| Available | Available | -|Swedish| Available | Available | -|Additional languages| [Contact us for more information](https://www.speechmatics.com/speak-to-sales)| +| Language | Realtime | Batch | +|---|---|---| +| Arabic English | Available | Available | +| Danish | Available | Available | +| Dutch | Available | Available | +| English | Available | Available | +| Finnish | Available | Available | +| French | Available | Available | +| German | Available | Available | +| Norwegian | Available | Available | +| Spanish | Available | Available | +| Swedish | Available | Available | +| Additional languages | [Contact us](https://www.speechmatics.com/speak-to-sales) | | diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx new file mode 100644 index 00000000..946c5cce --- /dev/null +++ b/docs/speech-to-text/models.mdx @@ -0,0 +1,103 @@ +--- +title: Models +description: Compare the Enhanced, Standard, and Melia 1 models and choose the right one for your audio. +--- + +# Models + +Compare the Speech to Text models and choose the right one for your audio. + +Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Melia 1. All three use the same API. The model you choose determines accuracy, how multilingual audio is handled, and which processing modes and regions are available. + +## Compare the models + +| Capability | Enhanced | Standard | Melia 1 | +|---|---|---|---| +| `model` value | `enhanced` | `standard` | `melia-1` | +| Accuracy | Highest | High | High | +| Turnaround | Fast | Fastest | Fast | +| Processing modes | Batch and Realtime | Batch and Realtime | Batch | +| Regions | EU, US, AUS | EU, US, AUS | EU, US | +| Language handling | Selected language or pack | Selected language or pack | Automatic multilingual | +| Custom dictionary | ✅ | ✅ | Not yet | +| Confidence scores | ✅ | ✅ | Not yet | +| Speech intelligence | ✅ | ✅ | Not yet | + +Enhanced and Standard are feature-identical and differ only in accuracy and speed: Enhanced delivers the highest accuracy, and Standard prioritises throughput. Melia 1 matches Standard for accuracy and adds automatic multilingual transcription, but it is available for Batch only and supports a reduced feature set. Speech intelligence covers translation, summarization, topic detection, chapters, sentiment, and audio events. + +## Choose a model + +Use Enhanced for the highest accuracy on single-language audio, such as medical, legal, or subtitling work. + +Use Standard when throughput, cost, or latency matter more than the last increment of accuracy, such as archival transcription, content indexing, or large-scale captioning. + +Use Melia 1 for audio that contains more than one language, including speakers who switch language mid-conversation. It offers fast turnaround and accuracy on par with Standard. + +## Specify a model + +Set the `model` property in your transcription config. If you do not set it, the `standard` model is used. + +This config selects the `enhanced` model: + +```json +{ + "type": "transcription", + "transcription_config": { + // highlight-start + "model": "enhanced", + // highlight-end + "language": "en" + } +} +``` + +Enhanced and Standard are available for Realtime and Batch transcription. Melia 1 is available for Batch transcription only. + +## Melia 1 + +Melia 1 is a multilingual model. It transcribes audio that contains more than one language, including speakers who switch language mid-conversation, and returns a single continuous transcript. It does not require you to select a language pack, and its accuracy is on par with the Standard model. + +Set `"model": "melia-1"` and `"language": "multi"`: + +```json +{ + "type": "transcription", + "transcription_config": { + // highlight-start + "model": "melia-1", + "language": "multi" + // highlight-end + } +} +``` + +:::warning +Melia 1 requires `language` to be set to `multi`. Any other value returns an error. +::: + +Melia 1 is available for Batch transcription in the EU and US regions only. It is not available in the Australia (AU1) region. + +| Region | Endpoint | +|---|---| +| EU1 (Europe) | `eu1.asr.api.speechmatics.com` | +| US1 (USA) | `us1.asr.api.speechmatics.com` | + +For the full list of Batch endpoints, refer to [Authentication](https://docs.speechmatics.com/get-started/authentication#supported-endpoints). + +Melia 1 matches the Enhanced and Standard models for core transcription features, including diarization, word timings, punctuation, notifications, and output locale. It does not yet support the following features, which are available with the Enhanced and Standard models: + +- Custom vocabulary and formatting: custom dictionary, find and replace, spoken form output, profanity tagging +- Output detail: confidence scores, entity detection, audio filtering +- Speech intelligence: audio events, translation, summarization, chapters, topics, sentiment + +Melia 1 is an early-access model and its feature support is expanding. Check the [release notes](https://speechmatics.featurebase.app/en/changelog) for the latest. + +To configure language hints and read the per-language output metadata, refer to [Input](https://docs.speechmatics.com/speech-to-text/batch/input) and [Output](https://docs.speechmatics.com/speech-to-text/batch/output). + +## Operating points + +The `model` property replaces the `operating_point` property. Existing configs that use `operating_point` continue to transcribe without changes. + +:::note +In SaaS (cloud) deployments, `operating_point` is deprecated. It maps to `model` and accepts the same `enhanced` and `standard` values. Use [`model`](#specify-a-model) going forward. +::: diff --git a/docs/speech-to-text/sidebar.ts b/docs/speech-to-text/sidebar.ts index 8f553811..c5be0cb6 100644 --- a/docs/speech-to-text/sidebar.ts +++ b/docs/speech-to-text/sidebar.ts @@ -14,6 +14,10 @@ export default { }, realtimeSidebar, batchSidebar, + { + type: "doc", + id: "speech-to-text/models", + }, { type: "doc", id: "speech-to-text/languages", From 123d3fc1dd63151fbc798387b5b01927e99dfce9 Mon Sep 17 00:00:00 2001 From: Matt Nemitz Date: Fri, 12 Jun 2026 12:10:58 +0100 Subject: [PATCH 03/18] Update spec --- spec/realtime.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/realtime.yaml b/spec/realtime.yaml index 7e49bd33..e889628a 100644 --- a/spec/realtime.yaml +++ b/spec/realtime.yaml @@ -821,7 +821,7 @@ components: Model: type: string description: | - Which model you wish to use. See [Operating points](http://docs.speechmatics.com/speech-to-text/#models) for more details. + Which model you wish to use. See [Models](http://docs.speechmatics.com/speech-to-text/models) for more details. enum: - standard - enhanced From e23b5a62be5e8c5ab24f5ba3bcd5019ce6f80ef5 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Fri, 12 Jun 2026 14:19:13 +0100 Subject: [PATCH 04/18] Address review feedback: discoverability of Melia 1 sections - Show H3 sections in the Input page TOC (toc_max_heading_level 2 -> 3) - Link comparison table cells to Languages, Language Identification, and the new Input multilingual section - Deep-link the Input/Output references on the Models page to the new section anchors - Note Melia 1 regional availability under Supported endpoints on the authentication page, linking to Models Co-Authored-By: Claude Fable 5 --- docs/get-started/authentication.mdx | 2 ++ docs/speech-to-text/batch/input.mdx | 2 +- docs/speech-to-text/models.mdx | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/get-started/authentication.mdx b/docs/get-started/authentication.mdx index a5dce2e2..7a0bf6ec 100644 --- a/docs/get-started/authentication.mdx +++ b/docs/get-started/authentication.mdx @@ -76,6 +76,8 @@ Speechmatics Batch SaaS supports the following endpoints for production use: Jobs are created in the region corresponding to the endpoint used. You must use the same endpoint for all requests relating to a specific job. +The Melia 1 model is available in the EU1 and US1 regions only. For details, refer to [Models](/speech-to-text/models#melia-1). + :::warning The EU2 and US2 Batch SaaS endpoints are provided for enterprise customer high availability and failover purposes only. Jobs created in these environments will not be visible in the Portal. ::: diff --git a/docs/speech-to-text/batch/input.mdx b/docs/speech-to-text/batch/input.mdx index 15d2c974..ddabced1 100644 --- a/docs/speech-to-text/batch/input.mdx +++ b/docs/speech-to-text/batch/input.mdx @@ -1,6 +1,6 @@ --- keywords: [speechmatics, transcription, speech recognition, asr, api, limits] -toc_max_heading_level: 2 +toc_max_heading_level: 3 title: 'Input – Batch' sidebar_label: 'Input' description: 'Learn about configuration and supported input audio formats for the Speechmatics Batch API' diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index c808502b..001f5de1 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -18,7 +18,7 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel | Turnaround | Fast | Fastest | Fast | | Processing modes | Batch and Realtime | Batch and Realtime | Batch | | Regions | EU, US, AUS | EU, US, AUS | EU, US | -| Language handling | Selected language or pack | Selected language or pack | Automatic multilingual | +| Language handling | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Automatic multilingual](/speech-to-text/batch/input#multilingual-transcription-with-melia-1) | | Custom dictionary | ✅ | ✅ | Not yet | | Confidence scores | ✅ | ✅ | Not yet | | Speech intelligence | ✅ | ✅ | Not yet | @@ -88,7 +88,7 @@ Melia 1 matches the Enhanced and Standard models for core transcription features Melia 1 is an early-access model and its feature support is expanding. Check the [release notes](https://speechmatics.featurebase.app/en/changelog) for the latest. -To configure language hints and read the per-language output metadata, refer to [Input](/speech-to-text/batch/input) and [Output](/speech-to-text/batch/output). +To configure language hints and read the per-language output metadata, refer to [Input](/speech-to-text/batch/input#multilingual-transcription-with-melia-1) and [Output](/speech-to-text/batch/output#multilingual-transcript-output). ## Operating points From 12523fe025603aa457e40ffd905fd22921d96ef5 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Fri, 12 Jun 2026 14:20:43 +0100 Subject: [PATCH 05/18] Use root-relative links in languages table per repo convention Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/languages.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/speech-to-text/languages.mdx b/docs/speech-to-text/languages.mdx index 446976d8..c3a30b2f 100644 --- a/docs/speech-to-text/languages.mdx +++ b/docs/speech-to-text/languages.mdx @@ -48,7 +48,7 @@ Speechmatics takes a global-first approach to languages. A single language pack | Language | Language code | Description | |---|---|---| -| Automatic | `auto` | Automatically detect the language using the [Language Identification](https://docs.speechmatics.com/speech-to-text/batch/language-identification) feature. Currently supported with Batch transcription only. | +| Automatic | `auto` | Automatically detect the language using the [Language Identification](/speech-to-text/batch/language-identification) feature. Currently supported with Batch transcription only. | | Arabic | `ar` | Global Arabic gives high-accuracy transcription across many accents and dialects, including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt, and the Levant. | | Arabic & English bilingual | `ar_en` | Ideal when transcribing Arabic and English in the same media file or stream. Supports all accents and dialects listed under Arabic and English. | | Bashkir | `ba` | | @@ -62,7 +62,7 @@ Speechmatics takes a global-first approach to languages. A single language pack | Czech | `cs` | | | Danish | `da` | | | Dutch | `nl` | | -| English | `en` | Global English gives high-accuracy transcription across many accents, including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand, and by non-native speakers. To standardize spelling, specify the [output locale](https://docs.speechmatics.com/speech-to-text/formatting#output-locale). | +| English | `en` | Global English gives high-accuracy transcription across many accents, including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand, and by non-native speakers. To standardize spelling, specify the [output locale](/speech-to-text/formatting#output-locale). | | Esperanto | `eo` | | | Estonian | `et` | | | Finnish | `fi` | | @@ -84,7 +84,7 @@ Speechmatics takes a global-first approach to languages. A single language pack | Malay | `ms` | | | Malay & English bilingual | `en_ms` | Ideal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English. | | Maltese | `mt` | | -| Mandarin | `cmn` | Global Mandarin can output [Traditional or Simplified characters](https://docs.speechmatics.com/speech-to-text/formatting#output-locale) and gives high-accuracy transcription across many accents, including (but not limited to) China, Taiwan, Singapore, and Malaysia. | +| Mandarin | `cmn` | Global Mandarin can output [Traditional or Simplified characters](/speech-to-text/formatting#output-locale) and gives high-accuracy transcription across many accents, including (but not limited to) China, Taiwan, Singapore, and Malaysia. | | Mandarin & English bilingual | `cmn_en` | Ideal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English. | | Mandarin Malay Tamil & English | `cmn_en_ms_ta` | Ideal when transcribing Mandarin, Malay, Tamil, and English in the same media file or stream. Supports all accents and dialects listed under Mandarin, Malay, Tamil, and English. | | Marathi | `mr` | | @@ -98,7 +98,7 @@ Speechmatics takes a global-first approach to languages. A single language pack | Slovakian | `sk` | | | Slovenian | `sl` | | | Spanish | `es` | Global Spanish gives high-accuracy transcription across many accents, including (but not limited to) Spanish spoken in Spain, the US, Mexico, Colombia, Argentina, Venezuela, Chile, and Peru. | -| Spanish & English bilingual | `es` (with domain=`bilingual-en`) | Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. [Requires the domain config to be set](https://docs.speechmatics.com/speech-to-text/languages#bilingual-and-multi-language-packs). | +| Spanish & English bilingual | `es` (with domain=`bilingual-en`) | Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. [Requires the domain config to be set](/speech-to-text/languages#bilingual-and-multi-language-packs). | | Swahili | `sw` | | | Swedish | `sv` | | | Tagalog (Filipino) & English bilingual | `tl` | Ideal when transcribing Tagalog (Filipino) and English in the same media file or stream. Supports all accents and dialects listed under English. | @@ -110,7 +110,7 @@ Speechmatics takes a global-first approach to languages. A single language pack | Urdu | `ur` | | | Uyghur | `ug` | | | Vietnamese | `vi` | | -| Welsh | `cy` | Welsh must be explicitly added to the [expected languages](https://docs.speechmatics.com/speech-to-text/batch/language-identification#expected-languages) list when using the Language Identification feature. Otherwise a [language not supported for transcription error](https://docs.speechmatics.com/speech-to-text/batch/language-identification#language-not-supported-for-transcription) is returned. | +| Welsh | `cy` | Welsh must be explicitly added to the [expected languages](/speech-to-text/batch/language-identification#expected-languages) list when using the Language Identification feature. Otherwise a [language not supported for transcription error](/speech-to-text/batch/language-identification#language-not-supported-for-transcription) is returned. | Each language is uniquely identified by a two-letter code (ISO 639-1) or three-letter code (ISO 639-3) in API requests and responses. From a2ba0ca50f374076ae0c45d2a08332ff06d13397 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Fri, 12 Jun 2026 14:31:58 +0100 Subject: [PATCH 06/18] Address review feedback: consolidate model comparison, surface Melia 1 on Languages page - Move the Melia 1 unsupported-features list into Compare the models so feature differences live in one place - Note that language codes double as Melia 1 language hints in the Languages page intro - Promote the Melia 1 pointer in the packs section to a note at the top of the section Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/languages.mdx | 8 +++++--- docs/speech-to-text/models.mdx | 16 +++++++++------- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/docs/speech-to-text/languages.mdx b/docs/speech-to-text/languages.mdx index c3a30b2f..192cdf52 100644 --- a/docs/speech-to-text/languages.mdx +++ b/docs/speech-to-text/languages.mdx @@ -34,7 +34,7 @@ See which languages Speechmatics supports for transcription and translation. To choose a transcription model, refer to [Models](/speech-to-text/models). -The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. It does not support the `auto` option, the bilingual and multi-language packs, or translation. For Melia 1, refer to [Models](/speech-to-text/models). +The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. You can use their language codes as [language hints](/speech-to-text/batch/input#multilingual-transcription-with-melia-1). Melia 1 does not support the `auto` option, the bilingual and multi-language pack codes, or translation. For Melia 1, refer to [Models](/speech-to-text/models). ## Transcription languages @@ -126,6 +126,10 @@ Translation is available with the Enhanced and Standard models. It is supported ## Bilingual and multi-language packs +:::note +These packs handle a fixed set of languages that you select in advance. To transcribe audio without selecting languages, including spontaneous switching across all supported languages, use the Melia 1 multilingual model. Refer to [Models](/speech-to-text/models). +::: + The Enhanced and Standard models can transcribe a selected combination of languages in one media file or stream, including speakers who switch between the languages in that pack. Each pack covers a fixed set of languages that you select with the `language` property. Supported packs are: @@ -163,8 +167,6 @@ This config selects the Spanish and English pack, which requires the `domain` pr } ``` -These packs handle a fixed set of languages that you select in advance. To transcribe audio without selecting languages, including spontaneous switching across all supported languages, use the Melia 1 multilingual model. Refer to [Models](/speech-to-text/models). - ## Healthcare domain Speechmatics offers a medical domain that provides high accuracy for healthcare use cases such as ambient scribes and dictation tools. The medical domain is available with the Enhanced model only. It does not apply to the Standard or Melia 1 models. diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index 475908d4..72fc5695 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -25,6 +25,14 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel Enhanced and Standard are feature-identical and differ only in accuracy and speed: Enhanced delivers the highest accuracy, and Standard prioritizes throughput. Melia 1 matches Standard for accuracy and adds automatic multilingual transcription, but it is available for Batch only and supports a reduced feature set. Speech intelligence covers translation, summarization, topic detection, chapters, sentiment, and audio events. +Melia 1 matches the Enhanced and Standard models for core transcription features, including diarization, word timings, punctuation, notifications, and output locale. It does not yet support the following features, which are available with the Enhanced and Standard models: + +- Custom vocabulary and formatting: custom dictionary, find and replace, spoken form output, profanity tagging +- Output detail: confidence scores, entity detection, audio filtering +- Speech intelligence: audio events, translation, summarization, chapters, topics, sentiment + +Melia 1 is an early-access model and its feature support is expanding. Check the [release notes](https://speechmatics.featurebase.app/en/changelog) for the latest. + ## Choose a model Use Enhanced for the highest accuracy on single-language audio, such as medical, legal, or subtitling work. @@ -84,13 +92,7 @@ Melia 1 is available for Batch transcription in the EU and US regions only. It i For the full list of Batch endpoints, refer to [Authentication](/get-started/authentication#supported-endpoints). -Melia 1 matches the Enhanced and Standard models for core transcription features, including diarization, word timings, punctuation, notifications, and output locale. It does not yet support the following features, which are available with the Enhanced and Standard models: - -- Custom vocabulary and formatting: custom dictionary, find and replace, spoken form output, profanity tagging -- Output detail: confidence scores, entity detection, audio filtering -- Speech intelligence: audio events, translation, summarization, chapters, topics, sentiment - -Melia 1 is an early-access model and its feature support is expanding. Check the [release notes](https://speechmatics.featurebase.app/en/changelog) for the latest. +For the features Melia 1 does not yet support, refer to [Compare the models](#compare-the-models). To configure language hints and read the per-language output metadata, refer to [Input](/speech-to-text/batch/input#multilingual-transcription-with-melia-1) and [Output](/speech-to-text/batch/output#multilingual-transcript-output). From e8fffc50f28bb4fa8528bd2a55be669559cdf410 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Fri, 12 Jun 2026 15:43:37 +0100 Subject: [PATCH 07/18] Remove intro lines that restate page titles; note speaker identification not yet on Melia 1 Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/languages.mdx | 2 -- docs/speech-to-text/models.mdx | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/speech-to-text/languages.mdx b/docs/speech-to-text/languages.mdx index 192cdf52..ccfab33b 100644 --- a/docs/speech-to-text/languages.mdx +++ b/docs/speech-to-text/languages.mdx @@ -30,8 +30,6 @@ keywords: # Languages -See which languages Speechmatics supports for transcription and translation. - To choose a transcription model, refer to [Models](/speech-to-text/models). The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. You can use their language codes as [language hints](/speech-to-text/batch/input#multilingual-transcription-with-melia-1). Melia 1 does not support the `auto` option, the bilingual and multi-language pack codes, or translation. For Melia 1, refer to [Models](/speech-to-text/models). diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index 72fc5695..b0958a3d 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -5,8 +5,6 @@ description: Compare the Enhanced, Standard, and Melia 1 models and choose the r # Models -Compare the Speech to Text models and choose the right one for your audio. - Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Melia 1. All three use the same API. The model you choose determines accuracy, how multilingual audio is handled, and which processing modes and regions are available. ## Compare the models @@ -21,6 +19,7 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel | Language handling | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Automatic multilingual](/speech-to-text/batch/input#multilingual-transcription-with-melia-1) | | Custom dictionary | ✅ | ✅ | Not yet | | Confidence scores | ✅ | ✅ | Not yet | +| Speaker identification | ✅ | ✅ | Not yet | | Speech intelligence | ✅ | ✅ | Not yet | Enhanced and Standard are feature-identical and differ only in accuracy and speed: Enhanced delivers the highest accuracy, and Standard prioritizes throughput. Melia 1 matches Standard for accuracy and adds automatic multilingual transcription, but it is available for Batch only and supports a reduced feature set. Speech intelligence covers translation, summarization, topic detection, chapters, sentiment, and audio events. @@ -29,6 +28,7 @@ Melia 1 matches the Enhanced and Standard models for core transcription features - Custom vocabulary and formatting: custom dictionary, find and replace, spoken form output, profanity tagging - Output detail: confidence scores, entity detection, audio filtering +- Speaker identification - Speech intelligence: audio events, translation, summarization, chapters, topics, sentiment Melia 1 is an early-access model and its feature support is expanding. Check the [release notes](https://speechmatics.featurebase.app/en/changelog) for the latest. From c67256e81c61935e3c18a0c83fd355529c79dfb3 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 13:26:13 +0100 Subject: [PATCH 08/18] Rename comparison table row label to Model Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index b0958a3d..2ffd0138 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -11,7 +11,7 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel | Capability | Enhanced | Standard | Melia 1 | |---|---|---|---| -| `model` value | `enhanced` | `standard` | `melia-1` | +| Model | `enhanced` | `standard` | `melia-1` | | Accuracy | Highest | High | High | | Turnaround | Fast | Fastest | Fast | | Processing modes | Batch and Realtime | Batch and Realtime | Batch | From 28cd9cc43fbb9ff3a431c0cff90f9455b8ef5b04 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 13:33:38 +0100 Subject: [PATCH 09/18] Correct Melia 1 language admonition: only auto returns an error Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index 2ffd0138..77a4c521 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -80,7 +80,7 @@ Set `"model": "melia-1"` and `"language": "multi"`: ``` :::warning -Melia 1 requires `language` to be set to `multi`. Any other value returns an error. +Melia 1 does not support the `auto` language value, which returns an error. Set `language` to `multi`. ::: Melia 1 is available for Batch transcription in the EU and US regions only. It is not available in the Australia (AU1) region. From 746c6ced656234e58d2bb8c4d30c63e101dc1e87 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 14:10:52 +0100 Subject: [PATCH 10/18] Expand language hints guidance; flag auto unsupported on Melia 1 - Rename Input page section to "Language hints" with fuller guidance: hints are optional, bias detection without restricting it, and the output only labels languages actually heard - Repoint inbound links to the new #language-hints anchor; point the comparison table's "Automatic multilingual" cell at the Melia 1 section - Add a note on the Language Identification page that Melia 1 does not support language: auto Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/batch/input.mdx | 10 ++++++---- docs/speech-to-text/batch/language-identification.mdx | 4 ++++ docs/speech-to-text/languages.mdx | 2 +- docs/speech-to-text/models.mdx | 4 ++-- 4 files changed, 13 insertions(+), 7 deletions(-) diff --git a/docs/speech-to-text/batch/input.mdx b/docs/speech-to-text/batch/input.mdx index ddabced1..3f546940 100644 --- a/docs/speech-to-text/batch/input.mdx +++ b/docs/speech-to-text/batch/input.mdx @@ -40,11 +40,13 @@ Below are the complete fields of the configuration object: -### Multilingual transcription with Melia 1 +### Language hints -The Melia 1 model transcribes audio containing more than one language. Select it with `"model": "melia-1"` and `"language": "multi"`; for basic model selection, refer to [Models](/speech-to-text/models). +The Melia 1 model detects every language it hears automatically, so language hints are optional. To select Melia 1, refer to [Models](/speech-to-text/models). -If you know which languages appear in the audio, provide them as language hints to reduce the chance of unexpected languages or scripts in the output. This config hints that the audio contains English and Arabic: +Hints tell the model which languages to expect in the audio, biasing detection toward them. They are most useful for short clips, audio with heavy accents, or recordings where two languages sound similar, where they make language labeling more reliable. + +Provide hints as a list of [supported languages](/speech-to-text/languages#transcription-languages) to guide detection without restricting it. This config hints that the audio contains English and Arabic: ```json { @@ -57,7 +59,7 @@ If you know which languages appear in the audio, provide them as language hints } ``` -Specify any number of [supported language codes](/speech-to-text/languages#transcription-languages). For a monolingual file, hinting the single language present can improve accuracy. +The model can still detect and label a language you did not hint, and it labels only the languages it actually hears. If you hint three languages but only two are spoken, the output contains two languages. That is expected, not a dropped hint. ## Fetch URL diff --git a/docs/speech-to-text/batch/language-identification.mdx b/docs/speech-to-text/batch/language-identification.mdx index 86071e36..c7ff1e3e 100644 --- a/docs/speech-to-text/batch/language-identification.mdx +++ b/docs/speech-to-text/batch/language-identification.mdx @@ -35,6 +35,10 @@ Once you are set up, just set `language` to `auto` to use Automatic Language Ide } ``` +:::note +The Melia 1 model does not support `language: auto` and returns an error. Melia 1 is multilingual and detects languages automatically without it. Refer to [Models](/speech-to-text/models). +::: + :::info To reliably identify the predominant language, the file should contain at least 60 seconds of speech in that language. ::: diff --git a/docs/speech-to-text/languages.mdx b/docs/speech-to-text/languages.mdx index ccfab33b..7c4ba01d 100644 --- a/docs/speech-to-text/languages.mdx +++ b/docs/speech-to-text/languages.mdx @@ -32,7 +32,7 @@ keywords: To choose a transcription model, refer to [Models](/speech-to-text/models). -The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. You can use their language codes as [language hints](/speech-to-text/batch/input#multilingual-transcription-with-melia-1). Melia 1 does not support the `auto` option, the bilingual and multi-language pack codes, or translation. For Melia 1, refer to [Models](/speech-to-text/models). +The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. You can use their language codes as [language hints](/speech-to-text/batch/input#language-hints). Melia 1 does not support the `auto` option, the bilingual and multi-language pack codes, or translation. For Melia 1, refer to [Models](/speech-to-text/models). ## Transcription languages diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index 77a4c521..e81951c3 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -16,7 +16,7 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel | Turnaround | Fast | Fastest | Fast | | Processing modes | Batch and Realtime | Batch and Realtime | Batch | | Regions | EU, US, AUS | EU, US, AUS | EU, US | -| Language handling | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Automatic multilingual](/speech-to-text/batch/input#multilingual-transcription-with-melia-1) | +| Language handling | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Automatic multilingual](#melia-1) | | Custom dictionary | ✅ | ✅ | Not yet | | Confidence scores | ✅ | ✅ | Not yet | | Speaker identification | ✅ | ✅ | Not yet | @@ -94,7 +94,7 @@ For the full list of Batch endpoints, refer to [Authentication](/get-started/aut For the features Melia 1 does not yet support, refer to [Compare the models](#compare-the-models). -To configure language hints and read the per-language output metadata, refer to [Input](/speech-to-text/batch/input#multilingual-transcription-with-melia-1) and [Output](/speech-to-text/batch/output#multilingual-transcript-output). +To configure language hints and read the per-language output metadata, refer to [Input](/speech-to-text/batch/input#language-hints) and [Output](/speech-to-text/batch/output#multilingual-transcript-output). ## Operating points From 7eb862c4c27ca8f30ea1cf836481b25eb9da4c0e Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 14:24:50 +0100 Subject: [PATCH 11/18] Language ID note: state the multi value for Melia 1 Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/batch/language-identification.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/batch/language-identification.mdx b/docs/speech-to-text/batch/language-identification.mdx index c7ff1e3e..9c7ab35d 100644 --- a/docs/speech-to-text/batch/language-identification.mdx +++ b/docs/speech-to-text/batch/language-identification.mdx @@ -36,7 +36,7 @@ Once you are set up, just set `language` to `auto` to use Automatic Language Ide ``` :::note -The Melia 1 model does not support `language: auto` and returns an error. Melia 1 is multilingual and detects languages automatically without it. Refer to [Models](/speech-to-text/models). +The Melia 1 model does not support `language: auto` and returns an error. Set `language` to `multi` instead; Melia 1 is multilingual and detects languages automatically. Refer to [Models](/speech-to-text/models). ::: :::info From c3b7a42740e7b8d87ba7a6cd9be7a98e46eaadbe Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 15:10:28 +0100 Subject: [PATCH 12/18] Add diarization and word labeling rows to model comparison table Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/models.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index e81951c3..bd7d4b44 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -17,6 +17,8 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel | Processing modes | Batch and Realtime | Batch and Realtime | Batch | | Regions | EU, US, AUS | EU, US, AUS | EU, US | | Language handling | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Automatic multilingual](#melia-1) | +| Diarization | Speaker and channel | Speaker and channel | Speaker and channel | +| Word labeling | Per file | Per file | Per word | | Custom dictionary | ✅ | ✅ | Not yet | | Confidence scores | ✅ | ✅ | Not yet | | Speaker identification | ✅ | ✅ | Not yet | From 63ed16330ecdebb704a84486bd1c304c2ae20a16 Mon Sep 17 00:00:00 2001 From: Pete Mo Date: Mon, 15 Jun 2026 15:19:38 +0100 Subject: [PATCH 13/18] Update docs/speech-to-text/batch/input.mdx Co-authored-by: Yahia <42359972+yaiir-a@users.noreply.github.com> --- docs/speech-to-text/batch/input.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/batch/input.mdx b/docs/speech-to-text/batch/input.mdx index 3f546940..e7ec63a7 100644 --- a/docs/speech-to-text/batch/input.mdx +++ b/docs/speech-to-text/batch/input.mdx @@ -59,7 +59,7 @@ Provide hints as a list of [supported languages](/speech-to-text/languages#trans } ``` -The model can still detect and label a language you did not hint, and it labels only the languages it actually hears. If you hint three languages but only two are spoken, the output contains two languages. That is expected, not a dropped hint. +The model can still detect and label a language you did not hint, and it labels only the languages it actually hears. ## Fetch URL From f40034f75177f9c21a0924708774695c033494c5 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 15:25:06 +0100 Subject: [PATCH 14/18] Trim language hints note Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/batch/input.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/batch/input.mdx b/docs/speech-to-text/batch/input.mdx index 3f546940..e7ec63a7 100644 --- a/docs/speech-to-text/batch/input.mdx +++ b/docs/speech-to-text/batch/input.mdx @@ -59,7 +59,7 @@ Provide hints as a list of [supported languages](/speech-to-text/languages#trans } ``` -The model can still detect and label a language you did not hint, and it labels only the languages it actually hears. If you hint three languages but only two are spoken, the output contains two languages. That is expected, not a dropped hint. +The model can still detect and label a language you did not hint, and it labels only the languages it actually hears. ## Fetch URL From 4f4c9acdf86cfa4904aaf5135cd0f63540c441ec Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 15:41:37 +0100 Subject: [PATCH 15/18] Set Melia 1 turnaround to Fastest per review Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index bd7d4b44..78e4a7c4 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -13,7 +13,7 @@ Speechmatics offers three models for Speech to Text: Enhanced, Standard, and Mel |---|---|---|---| | Model | `enhanced` | `standard` | `melia-1` | | Accuracy | Highest | High | High | -| Turnaround | Fast | Fastest | Fast | +| Turnaround | Fast | Fastest | Fastest | | Processing modes | Batch and Realtime | Batch and Realtime | Batch | | Regions | EU, US, AUS | EU, US, AUS | EU, US | | Language handling | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Selected language or pack](/speech-to-text/languages) ([auto-detect](/speech-to-text/batch/language-identification) available) | [Automatic multilingual](#melia-1) | From 2fb469e91a0c3feba67e58984a9d94e8b0a4e7ca Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 16:31:19 +0100 Subject: [PATCH 16/18] Tighten Choose a model section: drop table-restating and subjective phrasing Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/models.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index 78e4a7c4..d90a7b3f 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -39,9 +39,9 @@ Melia 1 is an early-access model and its feature support is expanding. Check the Use Enhanced for the highest accuracy on single-language audio, such as medical, legal, or subtitling work. -Use Standard when throughput, cost, or latency matter more than the last increment of accuracy, such as archival transcription, content indexing, or large-scale captioning. +Use Standard when throughput, cost, or latency matter more than maximum accuracy, such as archival transcription, content indexing, or large-scale captioning. -Use Melia 1 for audio that contains more than one language, including speakers who switch language mid-conversation. It offers fast turnaround and accuracy on par with Standard. +Use Melia 1 for audio that contains more than one language, including speakers who switch language mid-conversation. ## Specify a model From fb10eafab61cb52b7c2753a6d1d1102e5f9a6607 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 16:40:58 +0100 Subject: [PATCH 17/18] Remove cost framing from Choose a model section per writing principles Co-Authored-By: Claude Fable 5 --- docs/speech-to-text/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/speech-to-text/models.mdx b/docs/speech-to-text/models.mdx index d90a7b3f..433db7c1 100644 --- a/docs/speech-to-text/models.mdx +++ b/docs/speech-to-text/models.mdx @@ -39,7 +39,7 @@ Melia 1 is an early-access model and its feature support is expanding. Check the Use Enhanced for the highest accuracy on single-language audio, such as medical, legal, or subtitling work. -Use Standard when throughput, cost, or latency matter more than maximum accuracy, such as archival transcription, content indexing, or large-scale captioning. +Use Standard when throughput or latency matter more than maximum accuracy, such as archival transcription, content indexing, or large-scale captioning. Use Melia 1 for audio that contains more than one language, including speakers who switch language mid-conversation. From b1e90a3160c0a1f2de9ca54c6b284e2cfe510ac0 Mon Sep 17 00:00:00 2001 From: Pete Mo <175202887+cabbage-ice-cream@users.noreply.github.com> Date: Mon, 15 Jun 2026 16:56:04 +0100 Subject: [PATCH 18/18] Trigger Vercel redeploy (no content change) Co-Authored-By: Claude Fable 5