Skip to main content

Translation

Mode:Batch Deployments:SaaS

info

Translation will be offered at no additional cost until 30th April 2023.

Translating a file

Speechmatics enables you to translate your audio into multiple languages through a single API call to quickly add translation to your application.

Translation can be selected when calling the Speechmatics Transcription API. You can also try Translation for free in the Speechmatics portal with no code.

If you're new to Speechmatics, please see our guide on transcribing a file through our API. Once you are set up, try using the following config options:

{
  "type": "transcription",
  "transcription_config": {
    "operating_point":"enhanced", 
    "language": "en"
  },
  "translation_config": {
    "target_languages": ["es", "de"]
  }
}

Maximum number of translations: Each transcription can have up to five translations.

Translation result

The returned transcript will include a new property called translations, which contains a list of translated sentences for each target language requested. These languages are indicated using the same ISO language codes as for transcription.

Each translated sentence corresponds to a sentence in the transcription. Each one has the following properties:

  • content: The translated content
  • start_time: The start time of the translated content, which matches the start time of the first word in the transcript
  • end_time: The end time of the translated content, which matches the end time of the last word in the transcript
  • speaker: The speaker label when diarization:speaker is set. The default value is UU (Unknown speaker) - see here for more details on Speaker Diarization
  • channel: The channel label when diarization:channel is set - see here for more details on Channel Diarization

Here's an example of the transcription returned with translations included:

{
    "format": "2.9",
    "job": {
        "created_at": "2023-01-23T19:31:19.354Z",
        "data_name": "example.wav",
        "duration": 15,
        "id": "ggqjaazkqf"
    },
    "metadata": {
        "created_at": "2023-01-23T19:31:44.766Z",
        "type": "transcription",
        "transcription_config": {
            "language": "en",
            "diarization": "speaker"
        },
        "translation_config": {
            "target_languages": [
                "es"
            ]
        }
    },
    "results": [
        {
            "start_time": 0.78,
            "end_time": 1.32,
            "type": "word",
            "alternatives": [
                {
                    "content": "Welcome",
                    "confidence": 1.0,
                    "language": "en",
                    "speaker": "S1"
                }
            ]
        },
        ...
    ],
    "translations": {
        "es": [
            {
                "start_time": 0.78,
                "end_time": 2.58,
                "content": "Bienvenidos a Speechmatics.",
                "speaker": "S1"
            },
            {
                "start_time": 3.0,
                "end_time": 7.94,
                "content": "Esperamos que tengas un gran día.",
                "speaker": "S1"
            },
            ...
        ]
    }

Language pairs supported

Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below.

Audio LanguageTranslation Target Language
English (en)Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)English (en)
Norwegian Bokmål (no)Norwegian Nynorsk (nn)

Currently unsupported Speechmatics languages: Arabic, Bashkir, Belarusian, Welsh, Esperanto, Basque, Interlingua, Mongolian, Marathi, Tamil, Thai, Uyghur, Cantonese.

Considerations

When using Translation, there are a few things to keep in mind:

  • Accuracy of transcription: We recommended using the Enhanced operating point for the best translation results. Transcription accuracy directly impacts the accuracy of translation.
  • Punctuation: Punctuation plays a significant role in the accuracy of translation. It is therefore recommended to avoid disabling any punctuation marks or reducing the punctuation sensitivity to ensure the best possible results.
  • Formatting: The translation is applied to the written form transcript.
  • Transcription time: Enabling translation will increase the turnaround time of jobs submitted. The amount of time it increases for a single translation will be small. The number of translation target languages directly affects turnaround time.

Limitations

  • Maximum number of translations: Each transcription can have up to five translations.
  • Output formats: At this time, only the JSON transcript format is supported for translation. Text and SRT transcript formats are only available in the native language.
  • Other transcription features: The following transcription features are only available in the native language transcript, not the translation.
    • Single word timings
    • Confidence scores
    • Word tagging
    • Speaker change
    • Output locale

Error responses

Unsupported target language

If one or more of the target languages are not supported for the source language, a HTTP 400 error response is returned.

Example bad config:

{
   "type":"transcription",
   "transcription_config":{
      "language":"en"
   },
   "translation_config":{
      "target_languages":[
         "es",
         "zz"
      ]
   }
}

Response:

{
    "code": 400,
    "detail": "Job config JSON is invalid. Error: language zz is not a supported translation target for source language en",
    "error": "Job rejected"
}

Too many target languages

At this time, each transcription can have up to five translations. If you request more than five a HTTP 400 error response is returned.

{
    "code": 400,
    "detail": "maximum number of target languages is 5 and requested count is 6",
    "error": "Job rejected"
}

Translation failure

If the translation fails, the submitted job will fail and the status will be set to rejected.

This will either be visible in the job status (see here for more information) or sent through a notification, if configured. See here for more information on setting up notifications.

Feedback

Do you have requests or feedback on translation? If so, please send us your thoughts via our Translation Feedback Form.

If you want to report an issue or get Support more urgently raise an issue instead.