Real-time Latency

Transcription:Real-Time Deployments:All

When transcribing in real-time, you can control the maximum time to wait for the final transcript using the max_delay and max_delay_mode transcription config options.

{
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "max_delay": 3.5,
    "max_delay_mode": "fixed"
  }
}

The max_delay parameter controls the maximum latency of finals in the real-time transcription engine. This is the delay in seconds between receiving input audio and returning final transcription results. The default is 10. The minimum and maximum values are 2 and 20. Note that max_delay has no impact on how partials are returned.

Entities and flexible `max_delay_mode`

Using a fixed value of max_delay can increase the potential for inaccuracies in the transcript, especially around entities such as numerals, currencies, and dates.

Flexible max_delay_mode allows greater flexibility in the maximum latency only when a potential entity has been detected. Entities are common concepts such as numbers, currencies and dates, and are discussed in more detail here.

There are two options for max_delay_mode: fixed and flexible. The default is flexible.

flexible improves accuracy in entity recognition by allowing the latency to exceed the max_delay threshold when a potential entity is detected
fixed ensures that final transcripts never take longer than the max_delay threshold, even if this results in less accurate transcription of entities

Real-time Latency

Entities and flexible max_delay_mode​

Entities and flexible `max_delay_mode`