Skip to main content

Using the Container

Transcription:Batch Real-Time Deployments:Container

Once the Docker image has been pulled into a local environment, it can be started using the Docker run command. More details about operating and managing the container are available in the Docker API documentation.

There are two different methods for passing an audio file into a container:
  • STDIN: Streams audio file into the container though the standard command line entry point
  • File Location: Pulls audio file from a file location
Here are some examples below to demonstrate these modes of operating the containers.
Example 1: passing a file using the `cat` command to the Spanish (es) container

cat ~/$AUDIO_FILE | docker run -i \
  -e LICENSE_TOKEN=eyJhbGciOiJ... \
  batch-asr-transcriber-en:9.2.0

Example 2: pulling an audio file from a mapped directory into the container

docker run -i -v ~/$AUDIO_FILE:/input.audio \
  -e LICENSE_TOKEN=eyJhbGciOiJ... \
  batch-asr-transcriber-en:9.2.0

NOTE: the audio file must be mapped into the container with :/input.audio

The Docker run options used are:

NameDescription
--env, -eSet environment variables
--interactive , -iKeep STDIN open even if not attached
--volume , -vBind mount a volume

See Docker docs for a full list of the available options.

Both the methods will produce the same transcribed outcome. STDOUT is used to provide the transcription in a JSON format. Here's an example:

{
  "format": "2.8",
  "metadata": {
    "created_at": "2020-06-30T15:43:50.871Z",
    "type": "transcription",
    "language_pack_info": {
      "adapted": false,
      "itn": true,
      "language_description": "English",
      "word_delimiter": " ",
      "writing_direction": "left-to-right"
    },
    "transcription_config": {
      "language": "en",
      "diarization": "none",
      "additional_vocab": [
        {
          "content": "Met Office"
        },
        {
          "content": "Fitzroy"
        },
        {
          "content": "Forties"
        }
      ]
    }
  },
  "results": [
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "Are",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.61,
      "start_time": 3.49,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "on",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.73,
      "start_time": 3.61,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "the",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.79,
      "start_time": 3.73,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "rise",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 4.27,
      "start_time": 3.79,
      "type": "word"
    }
  ]
}

Intermediate files

The intermediate files created during the transcription are stored in /home/smuser/work. This is the case whether running the container as a root or non-root user.

Determining success

The exit code of the container will determine if the transcription was successful. There are two exit code possibilities:

  • Exit Code == 0 : The transcript was a success; the output will contain a JSON output defining the transcript (more info below)
  • Exit Code != 0 : the output will contain a stack trace and other useful information. This output should be used in any communication with Speechmatics support to aid understanding and resolution of any problems that may occur

Modifying the Image

Building an Image

Using STDIN to pass files in and obtain the transcription may not be sufficient for all use cases. It is possible to build a new Docker Image that will use the Speechmatics Image as a layer if required for your specific workflow. To include the Speechmatics Docker Image inside another image, ensure to add the pulled Docker image into the Dockerfile for the new application.

Requirements for a custom image

To ensure the Speechmatics Docker image works as expected inside the custom image, please consider the following:

  • Any audio that needs to be transcribed must to be copied to a file called /input.audio inside the running container
  • To initiate transcription, call the application pipeline. The pipeline will start the transcription service and use /input.audio as the audio source.
  • When running pipeline, the working directory must be set to /opt/orchestrator, using either the Dockerfile WORKDIR directive, the cd command or similar means.
  • Once pipeline finishes transcribing, ensure you move the transcription data outside the container

Dockerfile

To add a Speechmatics Docker image into a custom one, the Dockerfile must be modified to include the full image name of the locally available image.

Example: Adding Global English (en) with tag 9.2.0 to the Dockerfile

FROM batch-asr-transcriber-en:9.2.0
ADD download_audio.sh /usr/local/bin/download_audio.sh
RUN chmod +x /usr/local/bin/download_audio.sh
CMD ["/usr/local/bin/download_audio.sh"]

Once the above image is built, and a container instantiated from it, a script called download_audio.sh will be executed (this could do something like pulling a file from a webserver and copying it to /input.audio before starting the pipeline application). This is a very basic Dockerfile to demonstrate a way of orchestrating the Speechmatics Docker Image.

NOTE: For support purposes, it is assumed the Docker Image provided by Speechmatics has been unmodified. If you experience issues, Speechmatics support will require you to replicate the issues with the unmodified Docker image e.g. batch-asr-transcriber-en:9.2.0