Alignment
Transcription:Batch Deployments:SaaSAlignment allows the user to submit an audio file and a text file, and get back the speech timing information. This allows users to determine when exactly a given word was spoken in the context of the supplied audio file.
If you do not have access to use the alignment feature, and you would like to, please contact support@speechmatics.com or speak to your Account Manager.
Supported Formats
The input text file must be UTF-8 encoded plain text file. Characters outside this format will mean the job is rejected.
Text Formatting
Input
During the alignment process, Speechmatics tries to extract words from the text.
Any string of characters separated by whitespace (space, tab, newline, etc.) is considered as a word.
Any markup in the text file, with SGML-like tags with angled-brackets is considered as comments.
For example, text within the comment delimiters (<!--
, -->
) or angle brackets (<
, >
) is ignored.
Therefore, given this text:
Hello <markup> world <!-- comment > comment --> how are you?
The following words will be aligned with the provided audio file:
Hello world how are you?
Output
The timing information (termed as alignment files) are available in two formats:
- Word Start and End (
word_start_and_end
): This is the default format:
<time=0.12>Hello<time=0.23> <markup> <time=0.34>world<time=0.45> <!-- comment > comment -->
<time=0.56>how<time=0.67> <time=0.78>are<time=0.89> <time=0.90>you?<time=1.00>
- One per Line (
one_per_line
). This must be specified when you request the transcript via HTTP request.
[00:00:00.1] Hello <markup> world <!-- comment > comment --> how are you?
Submitting Alignment Jobs
Creating an alignment job is similar in process to transcription job.
An HTTP POST request must be made to /v2/jobs
endpoint with following form fields:
config
: The job config for alignmentdata_file
: The media file containing the speech. Can be passed in viaconfig
if the file is stored in an online locationtext_file
: The text file containing the transcript. Can be passed in viaconfig
if the file is stored in an online location
If you do not provide all of the above the job will be rejected.
The job config must state that the job type is alignment and the language of the audio and text.
{
"type": "alignment",
"alignment_config": {
"language": "en"
}
}
Retrieving Alignment Jobs
Checking status of alignment jobs is done in the same way as transcription jobs. This is described on this page.
An aligned file can be retrieved from the /v2/jobs/<JOB_ID>/alignment
endpoint. By default, the word_start_and_end
alignment format is returned. This can be overridden using the tags
query string parameter:
curl -X GET "https://asr.api.speechmatics.com/v2/jobs/${JOB_ID}/alignment?tags=one_per_line" \
-H "Authorization: Bearer ${API_KEY}"
Use the following endpoints to retrieve the inputs files used for an alignment job:
/v2/jobs/<JOB_ID>/text
: to get the text file submitted/v2/jobs/<JOB_ID>/data
: to get the audio file submitted
Note that alignment follows Speechmatics' data retention limits.
Fetching files from an online location
Speechmatics supports retrieving files from an online location. If you store your digital media and transcripts in cloud storage (for example AWS S3 or Azure Blob Storage) you can also submit a job by providing the URL of the audio file or transcript.
To retrieve files from an online location, you must specify the location for the media and/or transcript in the configuration of your request. You can locally upload a media file and retrieve a text file from an online location (or vice versa):
{
"type": "alignment",
"fetch_data": { "url": "$MY_AUDIO_URL" },
"fetch_text": { "url": "$MY_TRANSCRIPT" },
"alignment_config": { "language": "en" }
}
You should not use fetch_data
or fetch_text
with locally uploaded files simultaneously, as this will cause the job to fail.
Callback Notifications
Alignment jobs can also be used with callback notifications by including the notification_config
section in the job config when submitting the job. Please ensure you have allowlisted Speechmatics' egress IPs to allow notifications.
{
"type": "alignment",
"alignment_config": {
"language": "en"
},
"notification_config": [
{
"contents": ["alignment"],
"url": "https://lorem.ipsum/"
},
{
"contents": ["alignment.one_per_line", "text"],
"method": "post",
"url": "https://dolor.sit.amet/"
}
]
}
The following outputs are supported:
alignment
,alignment.one_per_line
,alignment.word_start_and_end
: the Aligned transcripttext
: the non-aligned transcript submitted as part of the job requestdata
: the media file submitted as part of the job requestjobinfo
: the summary information about the job, to support identification and tracking