Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 63 additions & 29 deletions explainers/on-device-speech-recognition.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,30 +31,61 @@ Some websites would only adopt the Web Speech API if it meets strict performance
### 3. Educational website (e.g. khanacademy.org)
Applications that need to function in unreliable or offline network conditions—such as voice-based productivity tools, educational software, or accessibility features—benefit from on-device speech recognition. This enables uninterrupted functionality during flights, remote travel, or in areas with limited connectivity. When on-device recognition is unavailable, a website can choose to hide the UI or gracefully degrade functionality to maintain a coherent user experience.

## New Methods
## New API Components

### 1. `Promise<boolean> availableOnDevice(DOMString lang)`
This method checks if on-device speech recognition is available for a specific language. Developers can use this to determine whether to enable features that require on-device speech recognition.
This enhancement introduces one new attribute to the `SpeechRecognition` interface and two new static methods for managing on-device capabilities.

### 1. `processLocally` Attribute
The `processLocally` boolean attribute on a `SpeechRecognition` instance allows developers to require that speech recognition be performed locally on the user's device.

- When set to `true`, the recognition session **must** be processed locally. If on-device recognition is not available for the specified language, the session will fail with a `service-not-allowed` error.
- When `false` (the default), the user agent is free to use either local or cloud-based recognition.

#### Example Usage
```javascript
const lang = 'en-US';
SpeechRecognition.availableOnDevice(lang).then((available) => {
if (available) {
console.log(`On-device speech recognition is available for ${lang}.`);
} else {
console.log(`On-device speech recognition is not available for ${lang}.`);
}
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.processLocally = true; // Require on-device speech recognition.

recognition.onerror = (event) => {
if (event.error === 'service-not-allowed') {
console.error('On-device recognition is not available for the selected language, or the request was denied.');
}
};

recognition.start();
```

### 2. `Promise<boolean> available(SpeechRecognitionOptions options)`
The static `SpeechRecognition.available(options)` method allows developers to check the availability of speech recognition for a given set of languages and processing preferences. It returns a `Promise` that resolves with an `AvailabilityStatus` string.

#### Example Usage
```javascript
const options = {
langs: ['en-US'],
processLocally: true // Check for on-device availability
};

SpeechRecognition.available(options).then((status) => {
console.log(`On-device availability for ${options.langs.join(', ')}: ${status}`);
if (status === 'available') {
console.log('Ready to use on-device recognition.');
} else if (status === 'downloadable') {
console.log('On-device recognition can be installed.');
}
});
```

### 2. `Promise<boolean> installOnDevice(DOMString[] lang)`
### 2. `Promise<boolean> install(SpeechRecognitionOptions options)`
This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models.

#### Example Usage
```javascript
const lang = 'en-US';
SpeechRecognition.installOnDevice([lang]).then((success) => {
const options = {
langs: ['en-US'],
processLocally: true
};
SpeechRecognition.install(options).then((success) => {
if (success) {
console.log('On-device speech recognition resources installed successfully.');
} else {
Expand All @@ -63,22 +94,25 @@ SpeechRecognition.installOnDevice([lang]).then((success) => {
});
```

## New Attribute

### 1. `mode` attribute in the `SpeechRecognition` interface
The `mode` attribute in the `SpeechRecognition` interface defines how speech recognition should behave when starting a session.

#### `SpeechRecognitionMode` Enum

- **"on-device-preferred"**: Use on-device speech recognition if available. If not, fall back to cloud-based speech recognition.
- **"on-device-only"**: Only use on-device speech recognition. If it's unavailable, throw an error.

#### Example Usage
```javascript
const recognition = new SpeechRecognition();
recognition.mode = "ondevice-only"; // Only use on-device speech recognition.
recognition.start();
```
## Supported languages
The availability of on-device speech recognition languages is user-agent dependent. As an example, Google Chrome supports the following languages for on-device recognition:
* de-DE (German, Germany)
* en-US (English, United States)
* es-ES (Spanish, Spain)
* fr-FR (French, France)
* hi-IN (Hindi, India)
* id-ID (Indonesian, Indonesia)
* it-IT (Italian, Italy)
* ja-JP (Japanese, Japan)
* ko-KR (Korean, South Korea)
* pl-PL (Polish, Poland)
* pt-BR (Portuguese, Brazil)
* ru-RU (Russian, Russia)
* th-TH (Thai, Thailand)
* tr-TR (Turkish, Turkey)
* vi-VN (Vietnamese, Vietnam)
* zh-CN (Chinese, Mandarin, Simplified)
* zh-TW (Chinese, Mandarin, Traditional)

## Privacy considerations
To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).
Expand Down