Real-time Latency
Transcription:Real-Time Deployments:AllWhen transcribing in real-time, you can control the maximum time to wait for the final transcript using the max_delay
and max_delay_mode
transcription config options.
{
"type": "transcription",
"transcription_config": {
"language": "en",
"max_delay": 3.5,
"max_delay_mode": "fixed"
}
}
The max_delay
parameter controls the maximum latency of finals in the real-time transcription engine. This is the delay in seconds between receiving input audio and returning final transcription results. The default is 10. The minimum and maximum values are 2 and 20. Note that max_delay
has no impact on how partials are returned.
Entities and flexible max_delay_mode
Using a fixed value of max_delay
can increase the potential for inaccuracies in the transcript, especially around entities such as numerals, currencies, and dates.
Flexible max_delay_mode
allows greater flexibility in the maximum latency only when a potential entity has been detected. Entities are common concepts such as numbers, currencies and dates, and are discussed in more detail here.
There are two options for max_delay_mode
: fixed
and flexible
. The default is flexible
.
flexible
improves accuracy in entity recognition by allowing the latency to exceed themax_delay
threshold when a potential entity is detectedfixed
ensures that final transcripts never take longer than themax_delay
threshold, even if this results in less accurate transcription of entities