Transcribe in real-time
Transcription:Real-Time Deployments:AllThe quickest way to try transcribing for free is by creating a Speechmatics account and using our Real-Time Demo in your browser.
This page will show you how to use the Speechmatics Real-Time SaaS WebSocket API to transcribe your voice in real-time by speaking into your microphone.
You can also learn about on-prem deployments by following our guides.
Set up
- Create an account on the Speechmatics Portal here.
- Navigate to the manage access page in the Speechmatics portal.
- Enter a name for your API key and store your API key somewhere safe.
Enterprise customers should speak to support at support@speechmatics.com to get your API keys.
Real-Time Python example
The examples below will help you get started by using the official Speechmatics Python library and CLI. You can of course integrate with Speechmatics in the programming language of your choice by referring to the Real-Time API reference.
The Speechmatics Python library and CLI can be installed using pip:
pip3 install speechmatics-python
- CLI
- File
- URL
- Microphone
speechmatics config set --auth-token $API_KEY --generate-temp-token
speechmatics rt transcribe example.wav
1import speechmatics
2from httpx import HTTPStatusError
3
4API_KEY = "YOUR_API_KEY"
5PATH_TO_FILE = "example.wav"
6LANGUAGE = "en"
7CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2/{LANGUAGE}"
8
9# Create a transcription client
10ws = speechmatics.client.WebsocketClient(
11 speechmatics.models.ConnectionSettings(
12 url=CONNECTION_URL,
13 auth_token=API_KEY,
14 generate_temp_token=True, # Enterprise customers don't need to provide this parameter
15 )
16)
17
18# Define an event handler to print the partial transcript
19def print_partial_transcript(msg):
20 print(f"[partial] {msg['metadata']['transcript']}")
21
22# Define an event handler to print the full transcript
23def print_transcript(msg):
24 print(f"[ FULL] {msg['metadata']['transcript']}")
25
26# Register the event handler for partial transcript
27ws.add_event_handler(
28 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
29 event_handler=print_partial_transcript,
30)
31
32# Register the event handler for full transcript
33ws.add_event_handler(
34 event_name=speechmatics.models.ServerMessageType.AddTranscript,
35 event_handler=print_transcript,
36)
37
38settings = speechmatics.models.AudioSettings()
39
40# Define transcription parameters
41# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
42conf = speechmatics.models.TranscriptionConfig(
43 language=LANGUAGE,
44 enable_partials=True,
45 max_delay=5,
46)
47
48print("Starting transcription (type Ctrl-C to stop):")
49with open(PATH_TO_FILE, 'rb') as fd:
50 try:
51 ws.run_synchronously(fd, conf, settings)
52 except KeyboardInterrupt:
53 print("\nTranscription stopped.")
54 except HTTPStatusError:
55 print("Invalid API key - Check your API_KEY at the top of the code!")
56
1import speechmatics
2from httpx import HTTPStatusError
3from urllib.request import urlopen
4
5API_KEY = "YOUR_API_KEY"
6LANGUAGE = "en"
7CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2/{LANGUAGE}"
8
9# The raw audio stream will be a few seconds ahead of the radio
10AUDIO_STREAM_URL="https://media-ice.musicradio.com/LBCUKMP3" # LBC Radio stream
11
12audio_stream = urlopen(AUDIO_STREAM_URL)
13
14# Create a transcription client
15ws = speechmatics.client.WebsocketClient(
16 speechmatics.models.ConnectionSettings(
17 url=CONNECTION_URL,
18 auth_token=API_KEY,
19 generate_temp_token=True, # Enterprise customers don't need to provide this parameter
20 )
21)
22
23# Define an event handler to print the partial transcript
24def print_partial_transcript(msg):
25 print(f"[partial] {msg['metadata']['transcript']}")
26
27# Define an event handler to print the full transcript
28def print_transcript(msg):
29 print(f"[ FULL] {msg['metadata']['transcript']}")
30
31# Register the event handler for partial transcript
32ws.add_event_handler(
33 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
34 event_handler=print_partial_transcript,
35)
36
37# Register the event handler for full transcript
38ws.add_event_handler(
39 event_name=speechmatics.models.ServerMessageType.AddTranscript,
40 event_handler=print_transcript,
41)
42
43settings = speechmatics.models.AudioSettings()
44
45# Define transcription parameters
46# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
47conf = speechmatics.models.TranscriptionConfig(
48 language=LANGUAGE,
49 enable_partials=True,
50 max_delay=5,
51)
52
53print("Starting transcription (type Ctrl-C to stop):")
54try:
55 ws.run_synchronously(audio_stream, conf, settings)
56except KeyboardInterrupt:
57 print("\nTranscription stopped.")
58except HTTPStatusError:
59 print("Invalid API key - Check your API_KEY at the top of the code!")
60
In order to use this script, you may also need to install PyAudio by running:
pip3 install pyaudio
brew install portaudio
brew link portaudio
BREW_PREFIX=$(brew --prefix)
CFLAGS="-I$BREW_PREFIX/include -L$BREW_PREFIX/lib" python3 -m pip install pyaudio
1import speechmatics
2from httpx import HTTPStatusError
3import asyncio
4import pyaudio
5
6API_KEY = "YOUR_API_KEY"
7LANGUAGE = "en"
8CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2/{LANGUAGE}"
9DEVICE_INDEX = -1
10CHUNK_SIZE = 1024
11
12
13class AudioProcessor:
14 def __init__(self):
15 self.wave_data = bytearray()
16 self.read_offset = 0
17
18 async def read(self, chunk_size):
19 while self.read_offset + chunk_size > len(self.wave_data):
20 await asyncio.sleep(0.001)
21 new_offset = self.read_offset + chunk_size
22 data = self.wave_data[self.read_offset:new_offset]
23 self.read_offset = new_offset
24 return data
25
26 def write_audio(self, data):
27 self.wave_data.extend(data)
28 return
29
30
31audio_processor = AudioProcessor()
32# PyAudio callback
33def stream_callback(in_data, frame_count, time_info, status):
34 audio_processor.write_audio(in_data)
35 return in_data, pyaudio.paContinue
36
37# Set up PyAudio
38p = pyaudio.PyAudio()
39if DEVICE_INDEX == -1:
40 DEVICE_INDEX = p.get_default_input_device_info()['index']
41 device_name = p.get_default_input_device_info()['name']
42 DEF_SAMPLE_RATE = int(p.get_device_info_by_index(DEVICE_INDEX)['defaultSampleRate'])
43 print(f"***\nIf you want to use a different microphone, update DEVICE_INDEX at the start of the code to one of the following:")
44 # Filter out duplicates that are reported on some systems
45 device_seen = set()
46 for i in range(p.get_device_count()):
47 if p.get_device_info_by_index(i)['name'] not in device_seen:
48 device_seen.add(p.get_device_info_by_index(i)['name'])
49 try:
50 supports_input = p.is_format_supported(DEF_SAMPLE_RATE, input_device=i, input_channels=1, input_format=pyaudio.paFloat32)
51 except Exception:
52 supports_input = False
53 if supports_input:
54 print(f"-- To use << {p.get_device_info_by_index(i)['name']} >>, set DEVICE_INDEX to {i}")
55 print("***\n")
56
57SAMPLE_RATE = int(p.get_device_info_by_index(DEVICE_INDEX)['defaultSampleRate'])
58device_name = p.get_device_info_by_index(DEVICE_INDEX)['name']
59
60print(f"\nUsing << {device_name} >> which is DEVICE_INDEX {DEVICE_INDEX}")
61print("Starting transcription (type Ctrl-C to stop):")
62
63stream = p.open(format=pyaudio.paFloat32,
64 channels=1,
65 rate=SAMPLE_RATE,
66 input=True,
67 frames_per_buffer=CHUNK_SIZE,
68 input_device_index=DEVICE_INDEX,
69 stream_callback=stream_callback
70)
71
72# Define connection parameters
73conn = speechmatics.models.ConnectionSettings(
74 url=CONNECTION_URL,
75 auth_token=API_KEY,
76 generate_temp_token=True,
77)
78
79# Create a transcription client
80ws = speechmatics.client.WebsocketClient(conn)
81
82# Define transcription parameters
83# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
84conf = speechmatics.models.TranscriptionConfig(
85 language=LANGUAGE,
86 enable_partials=True,
87 max_delay=5,
88)
89
90# Define an event handler to print the partial transcript
91def print_partial_transcript(msg):
92 print(f"[partial] {msg['metadata']['transcript']}")
93
94# Define an event handler to print the full transcript
95def print_transcript(msg):
96 print(f"[ FINAL] {msg['metadata']['transcript']}")
97
98# Register the event handler for partial transcript
99ws.add_event_handler(
100 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
101 event_handler=print_partial_transcript,
102)
103
104# Register the event handler for full transcript
105ws.add_event_handler(
106 event_name=speechmatics.models.ServerMessageType.AddTranscript,
107 event_handler=print_transcript,
108)
109
110settings = speechmatics.models.AudioSettings()
111settings.encoding = "pcm_f32le"
112settings.sample_rate = SAMPLE_RATE
113settings.chunk_size = CHUNK_SIZE
114try:
115 ws.run_synchronously(audio_processor, conf, settings)
116except KeyboardInterrupt:
117 print("\nTranscription stopped.")
118except HTTPStatusError:
119 print("Invalid API key - Check your API_KEY at the top of the code!")
120
Transcript outputs
The output format from the Speech API is JSON. There are two types of transcript that are provided: Final transcripts and Partial transcripts. Which one you decide to consume will depend on your use case, latency and accuracy requirements.
Final transcripts
Final transcripts are sentences or phrases that are provided at irregular intervals. Once output, these transcripts are considered final and will not be updated afterwards. The timing of the output is determined automatically by the Speechmatics ASR engine. This is affected by pauses in speech and other parameters resulting in a latency between audio input and output. The default latency can be adjusted using the max_delay
property in transcription_config
when starting the recognition session. Final transcripts are more accurate than Partial transcripts, and larger values of max_delay
increase the accuracy.
Partial transcripts
A Partial transcript, or Partial, is a transcript that can be updated at a later point in time. By default, only Final transcripts are produced. Partials must be explicitly enabled using the enable_partials
property in transcription_config
for the session. After a Partial transcript is first output, the Speechmatics ASR engine can use additional audio data and context to update the Partial. Hence, Partials are therefore available at very low latency but with lower initial accuracy. Partials typically provide a latency (the time between audio input and initial output) of less than 1 second. Partials can be used in conjunction with Final transcripts to provide low-latency transcripts which are adjusted over time.