Additional Security Features
Transcription:Batch Deployments:ContainerThis section documents additional measures you can take to run the Batch Container where there are restrictive requirements on data storage or user access.
Custom Mapping Temporary Directories to run the Batch Container
Users may wish to run the Batch Container in an environment where they cannot or do not want to write anything to disk, and instead use temporary storage like tmpfs
or ramfs
to ensure regulatory compliance. The Batch Container supports mounting temporary directories for the storage of all intermediate files created during transcription, as well as mounting the directories where input, output and job configuration files are placed. Files can also be locally retrieved by using the fetch_url
functionality in the configuration object.
Speechmatics also supports the --job-config
option to specify the location of the configuration object. The job config location must specify the location in the container at which the config file can be found. If this also needs to be in a temporary directory (e.g. tmp
), rather than tmpfs
this must be a volume from a host machine in which the configuration object can be found.
Below is an example, where the intermediate files and configuration object are in temporary storage. Please note that the --job-config
argument must come after the image name
docker run --rm -i \
--read-only --tmpfs /home/smuser \
-v <path/to/dir/in/host/containing/config.json>:/tmp \
-e LICENSE_TOKEN=$TOKEN_VALUE \
batch-asr-transcriber-en:9.2.0 \
--job-config /tmp/config.json
This example sets up a tmpfs
for intermediate files created by transcription, so that all such files are written to transient storage, rather than to disk. The configuration object is mounted in a retrievable folder in tmp
.
An alternative is to use tmp
as tmpfs
and then mount an additional read-only volume on a path inside the container in which the config can be found.
docker run --rm -i \
--read-only --tmpfs /home/smuser --tmpfs /tmp \
-v <path/to/dir/in/host/containing/config.json>:/configs_dir:ro \
-e LICENSE_TOKEN=$TOKEN_VALUE \
batch-asr-transcriber-en:9.2.0 \
--job-config /example_configs_dir/config.json
If the Container is run using Kubernetes, users can use the emptyDir to mount tmpfs
in the needed directories (/home/smuser and /tmp). Configuration files can also be stored in an emptyDir if any of the containers in the pod is able to put it there. This could be achieved in deployment software like Kubernetes by using an initContainer or using the sidecar pattern or by fetching the configuration from its original location and storing it in the emptyDir
volume. Then the transcriber should be called with the --job-config argument
pointing to the path in the emptyDir volume in which the config was stored.
Users can also pull files from temporary locations using fetch_url
functionality Below is a configuration example:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"fetch_data": {
"url": "file:///tmp/$FILENAME.wav"
}
}
Running a batch container as a non-root user
There are some use cases where you may not be able to run the batch container as a root user. This may be because you are working in a hosting environment that mandates the use of a named user rather than root.
You must start the container with the flag -–user $USERNUMBER:$GROUPID
. User number and group ID are non-zero numerical values from a value of 1 up to a value of 65535. Here is a working example:
docker run --user 1000:3000 ubuntu echo hello world
Getting Transcription Output as a non-root user
If you take transcription via the default STDOUT, then this will not change as a non-root user. Below is an example:
docker run -u 1020:4000 \
-v /Users/$USER/work/pipeline/mydev/config.json:/config.json \
-v /Users/$USER/work/pipeline/mydev/input.audio:/input.audio \
${IMAGE_NAME}
If you want to write the output to a specific directory, you must volume map a directory to which the non-root user would have access.
Running a Batch Container as a non-root user on Kubernetes
Please Note The examples below do not constitute an explicit recommendation to run as non-root user, merely a guideline on how to do so with Kubernetes only when this is an unavoidable requirement.
If you require named users to be deployed on Kubernetes Pods, you must set the following Security Config. The user and group must correspond to the user and group you use when starting the container
securityContext:
runAsUser: { non-zero numerical value between 0 and 65535 }
runAsGroup: { non-zero numerical value between 0 and 65535 }
There is more information on how to configure security settings on Kubernetes pods here
Some Kubernetes deployments may mandate the use of PodSecurity Admissions Controllers. These provide stricter security requirements. More information on them can be found here. If your deployment requires this set up, here is an example configuration that allows you to carry out transcription as a non-root user.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: "docker/default,runtime/default"
apparmor.security.beta.kubernetes.io/allowedProfileNames: "runtime/default"
seccomp.security.alpha.kubernetes.io/defaultProfileName: "runtime/default"
apparmor.security.beta.kubernetes.io/defaultProfileName: "runtime/default"
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- "configMap"
- "emptyDir"
- "projected"
- "secret"
- "downwardAPI"
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- "persistentVolumeClaim"
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Require the container to run without root privileges.
rule: "MustRunAsNonRoot"
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: "RunAsAny"
supplementalGroups:
rule: "MustRunAs"
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: "MustRunAs"
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false