Virtual Appliance Scaling
Transcription:Batch Real-Time Deployments:Virtual ApplianceReal-Time Virtual Appliance Scaling
This section explains how to scale the Real-Time Virtual Appliance, and gives advice on how to make sure you've allocated enough resources for your workload.
Worker Limits
The number of concurrent workers can be restricted using the Management API. This can be used to ensure that the system resources do not get exhausted by clients starting more sessions than expected. The maximum number of concurrent workers is set for the entire system, irrespective of which language packs are being used. The default number of maximum concurrent workers is 1.
View Maximum Workers
Use a GET request to the maxworkers
endpoint to view the maximum number of workers:
curl -L -X GET 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
-H 'Accept: application/json' \
| jq
This shows the maximum number of workers that can run concurrently on the appliance. If more sessions are opened by clients using the Speech API then you will receive the job error: No worker can be scheduled because the service is at capacity
.
Setting Maximum Workers
Before changing the maximum number of concurrent workers for real-time transcription, it is important that the virtual appliance has enough system resources (CPU and RAM) to support the new requirement (see the virtual appliance system requirements). This example shows how to set the maximum number of concurrent workers to 5:
curl -L -X POST 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{ "count": "5" }'
As a rule of thumb, each concurrent worker will require 1 vCPU and up to 2GB RAM.
Batch Virtual Appliance Scaling
This section explains how to scale the Batch Virtual Appliance, and gives advice on how to make sure you've allocated enough resources for your workload.
Worker Limits
The number of concurrent workers (jobs) can be restricted using the Management API. This can be used to ensure that the system resources do not get exhausted by clients starting more transcriptions than expected. The maximum number of concurrent workers is set for the entire system, irrespective of which language packs are being used. The default number of maximum concurrent workers is 1.
View Maximum Workers
Use a GET request to the maxworkers endpoint to view the maximum number of workers:
curl -L -X GET 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
-H 'Accept: application/json' \
| jq
The response will indicate the maximum number of workers that can run concurrently on the appliance. If more jobs are submitted by clients using the Speech API then these will be queued up and processed once there is spare capacity on the appliance.
Setting Maximum Workers
Before changing the maximum number of concurrent workers, it is important that the virtual appliance has enough system resources (CPU and RAM) to support the new requirement (see the Batch Virtual Appliance system requirements).
This example shows how to set the maximum number of concurrent workers to 5:
curl -L -X POST 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d'{ "count": "5" }'
Increasing the concurrent workers will increase the need for CPU and RAM. Look at the system requements for details.
If the number of jobs submitted exceeds the maximum number of concurrent workers then jobs will start to be queued, and the real-time factor (RTF) will increase, meaning you will wait longer for your transcripts to be made available.