Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
72a5b5e
DEL-33165: Add Melia 1, split Models and Languages pages
cabbage-ice-cream Jun 12, 2026
48183e2
Split "Languages and models" into separate pages, update `operating_p…
mnemitz Jun 12, 2026
123d3fc
Update spec
mnemitz Jun 12, 2026
603d0f6
Merge branch 'melia-1-docs' into del-33165-melia-models-languages
mnemitz Jun 12, 2026
e23b5a6
Address review feedback: discoverability of Melia 1 sections
cabbage-ice-cream Jun 12, 2026
08ffc11
Merge branch 'del-33165-melia-models-languages' of https://github.com…
cabbage-ice-cream Jun 12, 2026
12523fe
Use root-relative links in languages table per repo convention
cabbage-ice-cream Jun 12, 2026
a2ba0ca
Address review feedback: consolidate model comparison, surface Melia …
cabbage-ice-cream Jun 12, 2026
e8fffc5
Remove intro lines that restate page titles; note speaker identificat…
cabbage-ice-cream Jun 12, 2026
c67256e
Rename comparison table row label to Model
cabbage-ice-cream Jun 15, 2026
28cd9cc
Correct Melia 1 language admonition: only auto returns an error
cabbage-ice-cream Jun 15, 2026
746c6ce
Expand language hints guidance; flag auto unsupported on Melia 1
cabbage-ice-cream Jun 15, 2026
7eb862c
Language ID note: state the multi value for Melia 1
cabbage-ice-cream Jun 15, 2026
c3b7a42
Add diarization and word labeling rows to model comparison table
cabbage-ice-cream Jun 15, 2026
63ed163
Update docs/speech-to-text/batch/input.mdx
cabbage-ice-cream Jun 15, 2026
f40034f
Trim language hints note
cabbage-ice-cream Jun 15, 2026
16229b9
Merge branch 'del-33165-melia-models-languages' of https://github.com…
cabbage-ice-cream Jun 15, 2026
4f4c9ac
Set Melia 1 turnaround to Fastest per review
cabbage-ice-cream Jun 15, 2026
2fb469e
Tighten Choose a model section: drop table-restating and subjective p…
cabbage-ice-cream Jun 15, 2026
fb10eaf
Remove cost framing from Choose a model section per writing principles
cabbage-ice-cream Jun 15, 2026
b1e90a3
Trigger Vercel redeploy (no content change)
cabbage-ice-cream Jun 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions custom-words.txt
Original file line number Diff line number Diff line change
Expand Up @@ -339,3 +339,5 @@ seqs
vllm
configmap
sessiongroups
melia
مرحبا
4 changes: 2 additions & 2 deletions docs/deployments/container/cpu-speech-to-text.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -262,11 +262,11 @@ In general, the format is: `{language}_{domain}_{processor}_{operating_point}:{p
The parameters are:
- `language` - One of the supported [language codes](/speech-to-text/languages)

- `domain` - One of `general` or a domain used for some [multi-lingual transcription](/speech-to-text/languages#multilingual-speech-to-text) use cases. For example: `SM_PREWARM_ENGINE_MODES='es_bilingual-en_gpu_standard:1'`
- `domain` - One of `general` or a domain used for some [multi-lingual transcription](/speech-to-text/languages#bilingual-and-multi-language-packs) use cases. For example: `SM_PREWARM_ENGINE_MODES='es_bilingual-en_gpu_standard:1'`

- `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text)

- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/languages#models) you want to prewarm
- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/models) you want to prewarm

- `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start.

Expand Down
2 changes: 1 addition & 1 deletion docs/deployments/container/gpu-speech-to-text.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Once the GPU Server is running, follow the [Instructions for Linking a CPU Conta

### Running only one operating point

[Operating Points](/speech-to-text/languages#models) represent different levels of model complexity.
[Operating Points](/speech-to-text/models) represent different levels of model complexity.
To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the
`SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`.

Expand Down
2 changes: 1 addition & 1 deletion docs/deployments/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Feature availability varies depending on the deployment method you choose. Below

| Feature | Modes | Deployments |
| ------------------------------------------------------------------------------------- | --------------- | ------------- |
| [Multilingual speech to text](/speech-to-text/languages#multilingual-speech-to-text) | Batch, Realtime | SaaS, On-prem |
| [Multilingual speech to text](/speech-to-text/languages#bilingual-and-multi-language-packs) | Batch, Realtime | SaaS, On-prem |
| [Alignment](/speech-to-text/batch/alignment) | Batch | SaaS |
| [Audio events](/speech-to-text/features/audio-events) | Batch, Realtime | SaaS, On-prem |
| [Audio filtering](/speech-to-text/features/audio-filtering) | Batch, Realtime | SaaS, On-prem |
Expand Down
2 changes: 2 additions & 0 deletions docs/get-started/authentication.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ Speechmatics Batch SaaS supports the following endpoints for production use:

Jobs are created in the region corresponding to the endpoint used. You must use the same endpoint for all requests relating to a specific job.

The Melia 1 model is available in the EU1 and US1 regions only. For details, refer to [Models](/speech-to-text/models#melia-1).

:::warning
The EU2 and US2 Batch SaaS endpoints are provided for enterprise customer high availability and failover purposes only. Jobs created in these environments will not be visible in the Portal.
:::
Expand Down
22 changes: 21 additions & 1 deletion docs/speech-to-text/batch/input.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
keywords: [speechmatics, transcription, speech recognition, asr, api, limits]
toc_max_heading_level: 2
toc_max_heading_level: 3
title: 'Input – Batch'
sidebar_label: 'Input'
description: 'Learn about configuration and supported input audio formats for the Speechmatics Batch API'
Expand Down Expand Up @@ -40,6 +40,26 @@ Below are the complete fields of the configuration object:

<SchemaNode schema={batchSchema.definitions.JobConfig} />

### Language hints

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a short section above about general 'language selection', linking here: https://docs-git-del-33165-melia-models-languages-speechmatics.vercel.app/speech-to-text/languages#transcription-languages

Maybe a similar idea for the RT quickstart too

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This is a bit broader than this PR though — a general 'language selection' section touches the Input page structure (and the RT quickstart you mentioned) beyond the Melia 1 scope here. I'd suggest a separate ticket so this PR stays focused on the Melia 1 changes. Happy to raise one if you agree.


The Melia 1 model detects every language it hears automatically, so language hints are optional. To select Melia 1, refer to [Models](/speech-to-text/models).

Hints tell the model which languages to expect in the audio, biasing detection toward them. They are most useful for short clips, audio with heavy accents, or recordings where two languages sound similar, where they make language labeling more reliable.

Provide hints as a list of [supported languages](/speech-to-text/languages#transcription-languages) to guide detection without restricting it. This config hints that the audio contains English and Arabic:

```json
{
"type": "transcription",
"transcription_config": {
"model": "melia-1",
"language": "multi",
"language_hints": ["en", "ar"]
}
}
```

The model can still detect and label a language you did not hint, and it labels only the languages it actually hears.

## Fetch URL

Expand Down
4 changes: 4 additions & 0 deletions docs/speech-to-text/batch/language-identification.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ Once you are set up, just set `language` to `auto` to use Automatic Language Ide
}
```

:::note
The Melia 1 model does not support `language: auto` and returns an error. Set `language` to `multi` instead; Melia 1 is multilingual and detects languages automatically. Refer to [Models](/speech-to-text/models).
:::

:::info
To reliably identify the predominant language, the file should contain at least 60 seconds of speech in that language.
:::
Expand Down
50 changes: 50 additions & 0 deletions docs/speech-to-text/batch/output.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,56 @@ The following is an example of a transcript response, which you should see as an
{JSON.stringify(transcriptResponseExample, null, 2)}
</CodeBlock>

### Multilingual transcript output
Comment thread
J-Jaywalker marked this conversation as resolved.

For a Melia 1 job, the `language` property on each word reflects the language detected for that word, so it can change across the transcript. For Enhanced and Standard jobs, which transcribe one selected language, the same language is reported for every word.

The example below shows two words in different languages within one transcript:

```json
{
"results": [
{
"alternatives": [
{ "content": "Hello", "confidence": 0.98, "language": "en" }
],
"start_time": 0.20,
"end_time": 0.52,
"type": "word"
},
{
"alternatives": [
{ "content": "مرحبا", "confidence": 0.95, "language": "ar" }
],
"start_time": 0.60,
"end_time": 1.04,
"type": "word"
}
]
}
```

For multilingual transcripts, `language_pack_info` reports the word delimiter and writing direction per language rather than for a single language pack:

```json
{
"metadata": {
"language_pack_info": {
"per_language_word_delimiters": {
"en": " ",
"ar": " "
},
"per_language_writing_direction": {
"en": "left-to-right",
"ar": "right-to-left"
}
}
}
}
```

`per_language_word_delimiters` gives the word delimiter for each language in the transcript, and `per_language_writing_direction` gives its writing direction.

## Quicklinks

<Grid columns={{initial: "1", md: "2"}} gap="3">
Expand Down
2 changes: 1 addition & 1 deletion docs/speech-to-text/features/audio-filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,6 @@ To obtain volume labelling without filtering any audio, supply an empty config o

Once the audio is in a raw format (16kHz 16bit mono), it is split into 0.01s chunks. For each chunk, the root mean square amplitude of the signal is calculated, and scaled to the range `0 - 100`. If the volume is less than the supplied cut-off, the chunk will be replaced with silence.

To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [enhanced model](/speech-to-text/languages#operating-points), which is more robust against inadvertent damage to the audio.
To work successfully without degrading accuracy, the background speech must be significantly quieter than the foreground speech, otherwise the filtering process may remove small sections of the audio which should be transcribed. For this reason, the feature works better with the [enhanced model](/speech-to-text/models), which is more robust against inadvertent damage to the audio.

The word volume calculation takes the start and end times of words, and applies a weighted average of the volumes of each audio chunk which make up the word. The weighting attempts to ignore areas of silence within long words, and provide a better match with the volume classification a human listener would make.
4 changes: 2 additions & 2 deletions docs/speech-to-text/features/feature-discovery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ curl "https://eu1.asr.api.speechmatics.com/v1/discovery/features"

The feature discovery endpoint will include an object with the following properties:
- `metadata`
- `language_pack_info` - For each of our [supported languages](/speech-to-text/languages), give the full name of the language, as well as any [Domain Language Optimizations](/speech-to-text/languages#multilingual-speech-to-text) or [Output Locales](/speech-to-text/formatting#output-locale)
- `language_pack_info` - For each of our [supported languages](/speech-to-text/languages), give the full name of the language, as well as any [Domain Language Optimizations](/speech-to-text/languages#bilingual-and-multi-language-packs) or [Output Locales](/speech-to-text/formatting#output-locale)
- `batch` - Capabilities relating to our Batch API
- `transcription` - Capabilities relating to transcription
- `languages` - Includes a list of supported ISO language codes
- `locales` - Includes any languages with a supported [Output Locale](/speech-to-text/formatting#output-locale)
- `domains` - Includes any languages with a supported [Domain Language Optimizations](/speech-to-text/languages#multilingual-speech-to-text)
- `domains` - Includes any languages with a supported [Domain Language Optimizations](/speech-to-text/languages#bilingual-and-multi-language-packs)
- `translation` - Includes all [supported translation pairs](/speech-to-text/features/translation#languages)
- `languageid` - List of languages supported by [Language Identification](/speech-to-text/batch/language-identification)
Loading