Skip to main content

Channel diarization

Transcription:Batch Deployments:All

The V2 API also supports Channel diarization which enables each channel in multi-channel audio to be transcribed separately and collated into a single transcript. This provides perfect diarization at the channel level as well as better handling of cross-talk between channels. Using Channel diarization, files with up to 6 separate input channels are supported.

This is particularly useful for the Contact Centre use case, where audio is often recorded in stereo, with separate channel for the agent and the caller.

In order to use this feature you set the diarization property to channel. You optionally name these channels by using the channel_diarization_labels in the configuration:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "channel",
    "channel_diarization_labels": ["Agent", "Caller"]
  }
}

If you do not specify any labels then defaults will be used (e.g. Channel 1). The number of labels you use should be the same as the number of channels in your audio. Additional labels are ignored. When the transcript is returned a channel property for each word will indicate the speaker, for example:

"results": [
  {
    "type": "word",
    "end_time": 1.8,
    "start_time": 1.45,
    "channel": "Presenter",
    "alternatives": [
      {
        "display": {
          "direction": "ltr"
        },
        "language": "en",
        "content": "world",
        "confidence": 0.76
      }
    ]
  }
]