System requirements
Transcription:Batch Real-Time Deployments:ContainerSpeechmatics containerized deployments are built on the Docker platform. At present a separate Docker image is required for each language to be transcribed. Each docker image takes about 3GB of storage.
System requirements
An individual Docker image is required for each language transcription is required within. A single image can be used to create and run multiple containers concurrently, each running container will require the following resources:
- 1 vCPU
- 2-5GB RAM
- 100MB hard disk space If you are using the enhanced model, it is recommended to use the upper limit of the RAM recommendations
Please Note: When using the parallel processing functionality, of the batch container, this will require more resource due to the intensive memory required. When using parallel processing, we recommend using (NxRAM requirements) where N is the number of cores intended to be used for parallel processing. So if 2 cores were required for parallel processing, the RAM requirements would be up to 10GB
Host recommended specs
Standard operating point
- The host machine requires a processor with at least a Broadwell class microarchitecture or newer, with AVX2 instruction support
- If you are using a hypervisor, you should check it is configured to allow VM access to the AVX2 instructions
Enhanced operating point
- The host machine should have a processor with at least a Cascade Lake class microarchitecture or newer, with AVX512-VNNI instruction support. This will greatly improve transcription processing speed. Support for AVX2 instructions is required
- If you are using a hypervisor, you should check it is configured to allow VM access to the AVX2 and AVX512-VNNI instructions
Architecture
- Batch transcription
- Real-Time transcription
- Processes one input file and outputs a resulting transcript in a predefined language in a number of supported outputs
- These outputs and relevant metadata are described in more detail in the Speech API guide here
- Is licensed for languages and speech features which vary depending upon each individual contract
- Speech features are described after the Speech API guide
- Requires either a license file or license token before transcription starts
- Can run in a mode that parallelises processing across multiple cores
- Supports input file sizes up to 2 hours in length or 4GB in size
- Treats all data is transitory. Once a container completes its transcription it removes all record of the operation
- Provides the ability to transcribe speech data in a predefined language from a live stream or a recorded audio file.
- Speech features are described in the Speech API guide
- Multiple instances of the container can be run on the same Docker host. This enables scaling of a single language or multiple-languages as required.
- All data is transitory, once a container completes its transcription it removes all record of the operation, no data is persisted.