Speaker change detection
Transcription:Batch Real-Time Deployments:AllNote: We recommend using speaker diarization instead of speaker change due to improvements in speaker detection accuracy.
This feature introduces markers into the JSON transcript only that indicate when a speaker change has been detected in the audio. For example, if the audio contains two people speaking to each other, and you want the transcript to show when there is a change of speaker, specify speaker_change
as the diarization setting:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker_change"
}
}
The transcript will have special json elements in the results
array between two words where a different person started talking. For example, if one person says "Hello James" and the other responds with "Hi", there will a speaker_change
json element between "James" and "Hi".
"results": [
{
"start_time": 0.1,
"end_time": 0.22,
"type": "word",
"alternatives": [
{
"confidence": 0.71,
"content": "Hello",
"language": "en",
"speaker": "UU"
}
]
},
{
"start_time": 0.22,
"end_time": 0.55,
"type": "word",
"alternatives": [
{
"confidence": 0.71,
"content": "James",
"language": "en",
"speaker": "UU"
}
]
},
{
"start_time": 0.55,
"end_time": 0.55,
"type": "speaker_change",
"alternatives": []
},
{
"start_time": 0.56,
"end_time": 0.61,
"type": "word",
"alternatives": [
{
"confidence": 0.71,
"content": "Hi",
"language": "en",
"speaker": "UU"
}
]
}
]
The sensitivity of the speaker change detection is set to a sensible default that gives the optimum performance under most circumstances. You can however change this if you with using the speaker_change_sensitivity
setting, which takes a value between 0 and 1 (the default is 0.4). The higher the sensitivity setting, the more likelihood of a speaker change being indicated. We've found through our own experimentation that values outside the range 0.3-0.6 produce too few speaker change events, or too many false positives. Here's an example of how to set the value:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker_change",
"speaker_change_sensitivity": 0.55
}
}