Word Tagging
Transcription:Batch Real-Time Deployments:AllSpeechmatics outputs in the transcript a metadata tag to indicate whether a word is a profanity or a disfluency. You do not have to take any action to access this - it is provided in our transcription output as standard.
Profanity Tagging
You can use this tag in order to identify, redact, or obfuscate profanities and integrate this data into your own workflows.
Profanity tagging is available is for the following languages:
- English (EN)
- Italian (IT)
- Spanish (ES)
Note that the list of profanities in each language is not alterable.
An example of how this looks is below.
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "$PROFANITY",
"language": "en",
"speaker": "UU",
"tags": [
"profanity"
]
}
],
"end_time": 18.03,
"start_time": 17.61,
"type": "word"
}
]
Disfluency Tagging
A disfluency here refers to a set list of words in English that imply hesitation or indecision. Please note while disfluency can cover a range of items like stuttering and interjections, here it is only used to tag words such as 'hmm' or 'umm'. You can use this tag for your own post-processing workflows such as not displaying disfluencies. An example of how this looks is below:
English language only
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "hmm",
"language": "en",
"speaker": "UU",
"tags": [
"disfluency"
]
}
],
"end_time": 18.03,
"start_time": 17.61,
"type": "word"
}
]