Skip to main content

Word Tagging

Transcription:Batch Real-Time Deployments:All

Speechmatics outputs in the transcript a metadata tag to indicate whether a word is a profanity or a disfluency. You do not have to take any action to access this - it is provided in our transcription output as standard.

Profanity Tagging

You can use this tag in order to identify, redact, or obfuscate profanities and integrate this data into your own workflows.

Profanity tagging is available is for the following languages:

  • English (EN)
  • Italian (IT)
  • Spanish (ES)

Note that the list of profanities in each language is not alterable.

An example of how this looks is below.

"results": [
  {
    "alternatives": [
      {
        "confidence": 1.0,
        "content": "$PROFANITY",
        "language": "en",
        "speaker": "UU",
        "tags": [
          "profanity"
        ]
      }
    ],
    "end_time": 18.03,
    "start_time": 17.61,
    "type": "word"
  }
]

Disfluency Tagging

A disfluency here refers to a set list of words in English that imply hesitation or indecision. Please note while disfluency can cover a range of items like stuttering and interjections, here it is only used to tag words such as 'hmm' or 'umm'. You can use this tag for your own post-processing workflows such as not displaying disfluencies. An example of how this looks is below:

English language only

"results": [
  {
    "alternatives": [
      {
        "confidence": 1.0,
        "content": "hmm",
        "language": "en",
        "speaker": "UU",
        "tags": [
          "disfluency"
        ]
      }
    ],
    "end_time": 18.03,
    "start_time": 17.61,
    "type": "word"
  }
]