API Reference
MicVAD
The MicVAD
API is for recording user audio in the browser and running callbacks on speech segments and related events.
Support
Package |
Type |
Supported |
Description |
@realtimex/vad-web |
package |
Yes |
|
@realtimex/vad-react |
package |
No, use the useMicVAD hook |
|
Example
| import { MicVAD } from "@realtimex/vad-web"
const myvad = await MicVAD.new({
onSpeechEnd: (audio) => {
// do something with `audio` (Float32Array of audio samples at sample rate 16000)...
},
})
myvad.start()
|
Options
New instances of MicVAD
are created by calling the async static method MicVAD.new(options)
. The options object can contain the following fields (all are optional).
Option |
Type |
Default |
Description |
getStream |
() => Promise<MediaStream> |
Default getUserMedia with standard audio constraints |
Function that returns a Promise resolving to a MediaStream. By default, creates a stream with standard audio constraints (channelCount: 1, echoCancellation: true, autoGainControl: true, noiseSuppression: true). Override this to use custom audio constraints or provide your own stream. |
pauseStream |
(stream: MediaStream) => Promise<void> |
Stops all tracks in the stream |
Function called when the VAD is paused. By default, stops all tracks in the provided stream. Override this to implement custom pause behavior. |
resumeStream |
(stream: MediaStream) => Promise<MediaStream> |
Creates new stream with standard audio constraints |
Function called when the VAD is resumed. By default, creates a new stream with standard audio constraints. Override this to implement custom resume behavior. |
onFrameProcessed |
(probabilities: {isSpeech: float; notSpeech: float}, frame: Float32Array) => any |
() => {} |
Callback to run after each frame. The frame parameter contains the raw audio data for that frame. |
onVADMisfire |
() => any |
() => {} |
Callback to run if speech start was detected but onSpeechEnd will not be run because the audio segment is smaller than minSpeechMs |
onSpeechStart |
() => any |
() => {} |
Callback to run when speech start is detected |
onSpeechRealStart |
() => any |
() => {} |
Callback to run when actual speech positive frames exceeds min speech frames threshold is detected |
onSpeechEnd |
(audio: Float32Array) => any |
() => {} |
Callback to run when speech end is detected. Takes as arg a Float32Array of audio samples between -1 and 1, sample rate 16000. This will not run if the audio segment is smaller than minSpeechMs |
positiveSpeechThreshold |
number |
0.5 |
see algorithm configuration |
negativeSpeechThreshold |
number |
0.35 |
see algorithm configuration |
redemptionMs |
number |
768 |
see algorithm configuration |
preSpeechPadMs |
number |
96 |
see algorithm configuration |
minSpeechMs |
number |
288 |
see algorithm configuration |
submitUserSpeechOnPause |
boolean |
false |
If true, pausing the VAD triggers onSpeechEnd (if speaking with sufficient frames) or onVADMisfire |
model |
"v6" |
"v6" |
Silero model variant. Only v6 is supported. |
baseAssetPath |
string |
/ |
URL or path relative to webroot where vad.worklet.bundle.min.js and silero_vad_v6.onnx will be loaded from |
onnxWASMBasePath |
string |
/ |
URL or path relative to webroot where wasm files for onnxruntime-web will be loaded from |
workletOptions |
AudioWorkletNodeOptions |
{} |
Options to pass to the AudioWorkletNode constructor. |
Attributes
Attributes |
Type |
Default |
Description |
listening |
boolean |
false |
Is the VAD listening to mic input or is it paused? |
pause |
() => void |
|
Stop listening to mic input |
start |
() => void |
|
Start listening to mic input |
NonRealTimeVAD
The NonRealTimeVAD
API is for identifying segments of user speech if you already have a Float32Array of audio samples.
Support
Package |
Type |
Supported |
Description |
@ricky0123/vad-web |
package |
Yes |
|
@ricky0123/vad-react |
package |
No |
|
Example
| const vad = require("@realtimex/vad-web")
const options: Partial<vad.NonRealTimeVADOptions> = { /* ... */ }
const myvad = await vad.NonRealTimeVAD.new(options)
const audioFileData, nativeSampleRate = ... // get audio and sample rate from file or something
for await (const {audio, start, end} of myvad.run(audioFileData, nativeSampleRate)) {
// do stuff with
// audio (float32array of audio)
// start (milliseconds into audio where speech starts)
// end (milliseconds into audio where speech ends)
}
|
Options
New instances of MicVAD
are created by calling the async static method MicVAD.new(options)
. The options object can contain the following fields (all are optional).
Attributes
Attributes |
Type |
Default |
Description |
run |
async function* (inputAudio: Float32Array, sampleRate: number): AsyncGenerator |
|
Run the VAD model on your audio |
useMicVAD
A React hook wrapper for MicVAD. Use this if you want to run the VAD model on mic input in a React application.
Support
Package |
Type |
Supported |
Description |
@realtimex/vad-web |
package |
No, use MicVAD |
|
@realtimex/vad-react |
package |
Yes |
|
Example
| import { useMicVAD } from "@ricky0123/vad-react"
const MyComponent = () => {
const vad = useMicVAD({
startOnLoad: true,
onSpeechEnd: (audio) => {
console.log("User stopped speaking")
},
})
return <div>User speaking: {vad.userSpeaking}</div>
}
|
Options
The useMicVAD
hook takes an options object with the following fields (all are optional).
Option |
Type |
Default |
Description |
startOnLoad |
boolean |
true |
Whether to start the VAD automatically when the component loads. |
getStream |
() => Promise<MediaStream> |
Default getUserMedia with standard audio constraints |
Function that returns a Promise resolving to a MediaStream. By default, creates a stream with standard audio constraints (channelCount: 1, echoCancellation: true, autoGainControl: true, noiseSuppression: true). Override this to use custom audio constraints or provide your own stream. |
pauseStream |
(stream: MediaStream) => Promise<void> |
Stops all tracks in the stream |
Function called when the VAD is paused. By default, stops all tracks in the provided stream. Override this to implement custom pause behavior. |
resumeStream |
(stream: MediaStream) => Promise<MediaStream> |
Creates new stream with standard audio constraints |
Function called when the VAD is resumed. By default, creates a new stream with standard audio constraints. Override this to implement custom resume behavior. |
onFrameProcessed |
(probabilities: {isSpeech: float; notSpeech: float}, frame: Float32Array) => any |
() => {} |
Callback to run after each frame. The frame parameter contains the raw audio data for that frame. |
onVADMisfire |
() => any |
() => {} |
Callback to run if speech start was detected but onSpeechEnd will not be run because the audio segment is smaller than minSpeechMs |
onSpeechStart |
() => any |
() => {} |
Callback to run when speech start is detected |
onSpeechEnd |
(audio: Float32Array) => any |
() => {} |
Callback to run when speech end is detected. Takes as arg a Float32Array of audio samples between -1 and 1, sample rate 16000. This will not run if the audio segment is smaller than minSpeechMs |
positiveSpeechThreshold |
number |
0.5 |
see algorithm configuration |
negativeSpeechThreshold |
number |
0.35 |
see algorithm configuration |
redemptionMs |
number |
768 |
see algorithm configuration |
preSpeechPadMs |
number |
96 |
see algorithm configuration |
minSpeechMs |
number |
288 |
see algorithm configuration |
Returns
Attributes |
Type |
Default |
Description |
listening |
boolean |
false |
Is the VAD currently listening to mic input? |
errored |
false or { message: string} |
|
Did the VAD fail to load? |
loading |
boolean |
true |
Did the VAD finish loading? |
userSpeaking |
boolean |
false |
Is the user speaking? |
pause |
() => void |
|
Stop the VAD from running on mic input |
start |
() => void |
|
Start the VAD running on mic input |