Channel diarization
Transcription:Batch Deployments:AllThe V2 API also supports Channel diarization which enables each channel in multi-channel audio to be transcribed separately and collated into a single transcript. This provides perfect diarization at the channel level as well as better handling of cross-talk between channels. Using Channel diarization, files with up to 6 separate input channels are supported.
This is particularly useful for the Contact Centre use case, where audio is often recorded in stereo, with separate channel for the agent and the caller.
In order to use this feature you set the diarization
property to channel
. You optionally name these channels by using the channel_diarization_labels
in the configuration:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "channel",
"channel_diarization_labels": ["Agent", "Caller"]
}
}
If you do not specify any labels then defaults will be used (e.g. Channel 1). The number of labels you use should be the same as the number of channels in your audio. Additional labels are ignored. When the transcript is returned a channel
property for each word will indicate the speaker, for example:
"results": [
{
"type": "word",
"end_time": 1.8,
"start_time": 1.45,
"channel": "Presenter",
"alternatives": [
{
"display": {
"direction": "ltr"
},
"language": "en",
"content": "world",
"confidence": 0.76
}
]
}
]