Skip to main content

System requirements

Transcription:Batch Real-Time Deployments:Container

Speechmatics containerized deployments are built on the Docker platform. At present a separate Docker image is required for each language to be transcribed. Each docker image takes about 3GB of storage.

System requirements

An individual Docker image is required for each language transcription is required within. A single image can be used to create and run multiple containers concurrently, each running container will require the following resources:

  • 1 vCPU
  • 2-5GB RAM
  • 100MB hard disk space If you are using the enhanced model, it is recommended to use the upper limit of the RAM recommendations

Please Note: When using the parallel processing functionality, of the batch container, this will require more resource due to the intensive memory required. When using parallel processing, we recommend using (NxRAM requirements) where N is the number of cores intended to be used for parallel processing. So if 2 cores were required for parallel processing, the RAM requirements would be up to 10GB

Standard operating point

  • The host machine requires a processor with at least a Broadwell class microarchitecture or newer, with AVX2 instruction support
  • If you are using a hypervisor, you should check it is configured to allow VM access to the AVX2 instructions

Enhanced operating point

  • The host machine should have a processor with at least a Cascade Lake class microarchitecture or newer, with AVX512-VNNI instruction support. This will greatly improve transcription processing speed. Support for AVX2 instructions is required
  • If you are using a hypervisor, you should check it is configured to allow VM access to the AVX2 and AVX512-VNNI instructions

Architecture

Each container:
  • Processes one input file and outputs a resulting transcript in a predefined language in a number of supported outputs
  • These outputs and relevant metadata are described in more detail in the Speech API guide here
  • Is licensed for languages and speech features which vary depending upon each individual contract
  • Speech features are described after the Speech API guide
  • Requires either a license file or license token before transcription starts
  • Can run in a mode that parallelises processing across multiple cores
  • Supports input file sizes up to 2 hours in length or 4GB in size
  • Treats all data is transitory. Once a container completes its transcription it removes all record of the operation