System requirements

Transcription:Batch Real-Time Deployments:Container

Speechmatics containerized deployments are built on the Docker platform. At present a separate Docker image is required for each language to be transcribed. Each docker image takes about 3GB of storage.

System requirements

An individual Docker image is required for each language transcription is required within. A single image can be used to create and run multiple containers concurrently, each running container will require the following resources:

1 vCPU
2-5GB RAM
100MB hard disk space If you are using the enhanced model, it is recommended to use the upper limit of the RAM recommendations

Please Note: When using the parallel processing functionality, of the batch container, this will require more resource due to the intensive memory required. When using parallel processing, we recommend using (NxRAM requirements) where N is the number of cores intended to be used for parallel processing. So if 2 cores were required for parallel processing, the RAM requirements would be up to 10GB

Host recommended specs

Standard operating point

The host machine requires a processor with at least a Broadwell class microarchitecture or newer, with AVX2 instruction support
If you are using a hypervisor, you should check it is configured to allow VM access to the AVX2 instructions

Enhanced operating point

The host machine should have a processor with at least a Cascade Lake class microarchitecture or newer, with AVX512-VNNI instruction support. This will greatly improve transcription processing speed. Support for AVX2 instructions is required
If you are using a hypervisor, you should check it is configured to allow VM access to the AVX2 and AVX512-VNNI instructions

Architecture

Batch transcription
Real-Time transcription

Each container:

Processes one input file and outputs a resulting transcript in a predefined language in a number of supported outputs
These outputs and relevant metadata are described in more detail in the Speech API guide here
Is licensed for languages and speech features which vary depending upon each individual contract
Speech features are described after the Speech API guide
Requires either a license file or license token before transcription starts
Can run in a mode that parallelises processing across multiple cores
Supports input file sizes up to 2 hours in length or 4GB in size
Treats all data is transitory. Once a container completes its transcription it removes all record of the operation

System requirements