Automatic usage reporting
Transcription:Batch Real-Time Deployments:Container Status:BetaCompatibility
To enable automatic usage reporting, you must be running one of the following ASR Container versions:
- Batch Container 9.3.0 onwards
- Real-Time Container 2.3.0 onwards
Introduction
The most convenient way of reporting usage to Speechmatics is by enabling automatic usage reporting. Once this is enabled, the transcriber will automatically connect to Speechmatics servers to send required usage analytics.
This feature works by sending periodic HTTPS requests to Speechmatics over the course of a transcription session. Information recorded includes the job configuration, the duration of transcription, and the amount of audio being transcribed. We aim to be completely transparent about exactly what data we record.
This feature is turned OFF by default and is currently opt-in. It is turned on by setting the environment variable SM_ENABLE_USAGE_REPORTING=true
(true
, yes
or 1
are equally valid) when running the transcriber. For example:
docker run -i -v ~/$AUDIO_FILE:/input.audio \
-e LICENSE_TOKEN=eyJhbGciOiJ... \
-e SM_ENABLE_USAGE_REPORTING=true \
batch-asr-transcriber-en:9.3.0
We will never send customer audio data over the network. See What data do we record for a full description of what information will be recorded.
Technical Details
The batch transcriber will report one TRANSCRIBER_DONE
event at the event of transcription.
The real-time transcriber will report one SESSION_ENDED
event at the end of each session. During a session, the real-time transcriber also sends SESSION_STATUS
every few minutes.
The payload size is only several KB, so it won’t have a meaningful impact on the duration of transcription or your bandwidth costs.
If usage reporting is successful then at the end of the session the following message will be visible in the transcriber logs:
2022-12-01 13:55:24.332 INFO sentryserver Usage reported to Speechmatics
Network failure
In the event of a network failure (for example, if your internet connection is down or our usage server has a temporary outage) the transcriber will attempt to reconnect to our usage server several times.
2022-12-01 13:53:55.918 ERROR sentryserver Error 'Post "https://usage.speechmatics.com/v1/log": dial tcp: lookup usage.speechmatics.com on 192.168.4.129:53: no such host' occurred when logging EATS data: retrying
2022-12-01 13:53:56.475 ERROR sentryserver Error 'Post "https://usage.speechmatics.com/v1/log": dial tcp: lookup usage.speechmatics.com on 192.168.4.129:53: no such host' occurred when logging EATS data: retrying
2022-12-01 13:53:57.561 ERROR sentryserver Error 'Post "https://usage.speechmatics.com/v1/log": dial tcp: lookup usage.speechmatics.com on 192.168.4.129:53: no such host' occurred when logging EATS data: retrying
If, after this retry period (which takes up to 10 seconds), the transcriber is still unable to contact our usage server then it will output some WARNING
log messages then cease attempting to send usage information. If this happens then the transcriber will exit normally with an exit code of 0.
2022-12-02 13:26:29.962 WARNING sentryserver SM Usage Reporting: Error handling item, current retry count 1
2022-12-02 13:26:29.962 WARNING sentryserver SM Usage Reporting: deactivated because max retry limit reached
2022-12-02 13:26:30.963 WARNING sentryserver SM Usage Reporting: deactivated so item will be skipped
For batch transcribers, the transcriber will exit immediately after this.
For real-time transcribers, usage reporting will be disabled for a fixed time period (currently 60 seconds). This is to minimize the impact on the duration of transcription jobs. This retry mechanism will cause a small hit to the speed of transcription, so in the event of a network outage, you may wish to temporarily disable usage reporting by not setting the SM_ENABLE_USAGE_REPORTING variable when running the container.
We ask that you inform our finance team about the duration and timing of any such outage.