Skip to main content

GPU reference implementation

Transcription:Batch Deployments:Container

info
Latest images
  • docker-public.artifacts.speechmatics.io/sm-gpu-inference-server-en:10.0.0
  • docker-public.artifacts.speechmatics.io/batch-asr-transcriber-en:10.0.0
  • docker-public.artifacts.speechmatics.io/rt-asr-transcriber-en:10.0.0

Note: Customers who are not using GPU hardware should continue to use transcriber version 9.X.X and below. Transcriber version 10.0.0 has not yet been optimized for CPU-only usage.

This docker-compose file will create a Speechmatics GPU inference server:

---
version: '3.8'

networks:
  transcriber:
    driver: bridge

services:
  triton:
    image: docker-public.artifacts.speechmatics.io/sm-gpu-inference-server-en:10.0.0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              ### Limit to N GPUs
              # count: 1
              ### Pick specific GPUs by device ID
              # device_ids:
              #   - 0
              #   - 3
              capabilities:
                - gpu
    container_name: triton
    networks:
      - transcriber
    expose:
      - 8000/tcp
      - 8001/tcp
      - 8002/tcp
    environment:
      - NVIDIA_REQUIRE_CUDA=cuda>=11.4
      - NVIDIA_DRIVER_CAPABILITIES=all
      - NVIDIA_VISIBLE_DEVICES=all
    volumes:
      - $PWD/license.json:/license.json:ro

To run a transcription:

docker run --rm \
  --network transcriber \
  --name transcriber \ 
  -v $PWD/license.json:/license.json \
  -e SM_INFERENCE_ENDPOINT=triton:8001 \
  -i <speech_container_image_name> < ./example.wav

(assumes your license.json file is in the current working directory)