GPU reference implementation
Transcription:Batch Deployments:Containerinfo
Latest images
docker-public.artifacts.speechmatics.io/sm-gpu-inference-server-en:10.0.0
docker-public.artifacts.speechmatics.io/batch-asr-transcriber-en:10.0.0
docker-public.artifacts.speechmatics.io/rt-asr-transcriber-en:10.0.0
Note: Customers who are not using GPU hardware should continue to use transcriber version 9.X.X and below. Transcriber version 10.0.0 has not yet been optimized for CPU-only usage.
This docker-compose file will create a Speechmatics GPU inference server:
---
version: '3.8'
networks:
transcriber:
driver: bridge
services:
triton:
image: docker-public.artifacts.speechmatics.io/sm-gpu-inference-server-en:10.0.0
deploy:
resources:
reservations:
devices:
- driver: nvidia
### Limit to N GPUs
# count: 1
### Pick specific GPUs by device ID
# device_ids:
# - 0
# - 3
capabilities:
- gpu
container_name: triton
networks:
- transcriber
expose:
- 8000/tcp
- 8001/tcp
- 8002/tcp
environment:
- NVIDIA_REQUIRE_CUDA=cuda>=11.4
- NVIDIA_DRIVER_CAPABILITIES=all
- NVIDIA_VISIBLE_DEVICES=all
volumes:
- $PWD/license.json:/license.json:ro
To run a transcription:
docker run --rm \
--network transcriber \
--name transcriber \
-v $PWD/license.json:/license.json \
-e SM_INFERENCE_ENDPOINT=triton:8001 \
-i <speech_container_image_name> < ./example.wav
(assumes your license.json
file is in the current working directory)