Farsava API

Farsava API. Speech Recognition and Text to Speech by applying powerful deep neural network models.

This is the documentation for version 1.0.7 of the API. This documentation has been updated on Jul 7, 2019.

Base URL
https://api.amerandish.com/v1

Speech

GET /speech/healthcheck

speech health check endpoint.


This endpoint will return a simple json including service status and API version.

Responses
  • 200 object

    OK.

    • status string

      Values are Running, Warnings, and Critical.

    • message string

      Health check message. Returns OK if running without problem.

    • version string

      API version.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/speech/healthcheck
Example request
$ curl \ -X GET https://api.amerandish.com/v1/speech/healthcheck \ -H "Content-Type: application/json"
Example response (200)
{ "status": "Running", "message": "string", "version": "string" }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

POST /speech/asr

Performs synchronous speech recognition


This resource receives audio data in different formats and transcribes the audio using state-of-the-art deep neural networks. It performs synchronous speech recognition and the result will be availble after all audio has been sent and processed. This endpoint is designed for transcription of short audio files upto 1 minute.


Using config object you can can specify audio configs such as audioEncoding and sampleRateHertz. We will support different languages so you can choose the languageCode. Using asrModel and languageModel in config you can use customized models. Refer to asrLongRunning and WebSocket API for longer audio transcriptions.

Body
  • config Required / object

    Provides information to the recognizer that specifies how to process the request.

    • config.audioEncoding Required / string

      Encoding of audio data sent in all RecognitionAudio messages. In case of voice synthesize, this is the format of the requested audio byte stream. This field is required for all audio formats.

      Values are LINEAR16, FLAC, and MP3.

    • config.sampleRateHertz Required / integer

      Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that is not possible, use the native sample rate of the audio source (instead of re-sampling). This field is required for all audio formats. In Text to Speech endpoint is the synthesis sample rate (in hertz) for audio and Optional. If this is different from the voice's natural sample rate, then the synthesizer will honor this request by converting to the desired sample rate (which might result in worse audio quality), unless the specified sample rate is not supported for the encoding chosen.

    • config.languageCode Required / string

      The language of the supplied audio as a language tag. Example en for english language. See Language Support for a list of the currently supported language codes.

      Values are fa and en.

    • config.maxAlternatives integer

      Optional Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of SpeechRecognitionAlternative messages within each SpeechRecognitionResult. The server may return fewer than maxAlternatives. Valid values are 1-5. A value of 0 or 1 will return a maximum of one. If omitted, will return a maximum of one.

      Minimum value is 0, maximum value is 5.

    • config.profanityFilter boolean

      Optional If set to true, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "s***". If set to false or omitted, profanities will not be filtered out.

    • config.asrModel string

      Optional Which model to select for the given request. Select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the parameters in the RecognitionConfig.

      Model Description
      default Best for audio that is not one of the specific audio models. For example, long-form audio. Ideally the audio is high-fidelity, recorded at a 16khz or greater sampling rate.
      video Best for audio that originated from from video or includes multiple speakers. Ideally the audio is recorded at a 16khz or greater sampling rate.
      commandandsearch Best for short queries such as voice commands or voice search. To be released.
      phone_call Best for audio that originated from a phone call (typically recorded at an 8khz sampling rate). To be released.

      Values are default, video, command_and_search, and phone_call.

    • config.languageModel string

      This is the language model id of a customized trained language model. You can train your own language models and then use them to recognize speech. Refer to languagemodel/train for more info.

      There are some pretrained language models which you can use.

      Model Description
      general Best for audio content that is not one of the specific language models. This is the default language model and if you are not sure which one to use, simply use 'general'.
      numbers Best for audio content that contains only spoken numbers. For examble this language model can be used for speech enabled number input fileds.
      yesno Best for audio content that contains yes or no. For examble this language model can be used to receive confirmation from user.
      country Best for audio content that contains only spoken country. For examble this language model can be used for speech enabled input fileds.
      city Best for audio content that contains only spoken city. For examble this language model

      can be used for speech enabled input fileds.
      career | Best for audio content that contains only spoken career names. For examble this language model can be used for speech enabled input fileds.

  • audio Required / object

    Contains audio data in the encoding specified in the RecognitionConfig.

    A base64-encoded string.
    For asr endpoint only binary data is accepted.

    Property Description
    data The audio data bytes encoded as specified in RecognitionConfig. A base64-encoded string.
    • audio.data Required / string(byte)

      The audio data bytes encoded as specified in RecognitionConfig.
      A base64-encoded string.

Responses
  • 201 object

    OK. Transcription Generated.

    • transcriptionId string(uuid)

      A UUID string specifying a unique pair of audio and recognitionResult. It can be used to retrieve this recognitionResult using transcription endpoint. asrLongRunning recognitionResult will only be available using transcription endpoint and this transcriptionId.

    • duration number(double)

      File duration in seconds.

    • inferenceTime number(double)

      Total inference time in seconds.

    • status string

      Status of the recognition process. USE THE RECOGNITION RESULT ONLY WHEN STATUS IS DONE.

      Values are queued, processing, done, and partial.

    • results array[object]

      Sequential list of transcription results corresponding to sequential portions of audio. May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

      • results.transcript string

        A UTF8-Encoded string. Transcript text representing the words that the user spoke.

      • results.confidence number(double)

        The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        Minimum value is 0, maximum value is 1.

      • results.words array[object]
        • results.words.startTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.endTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.word string

          The word corresponding to this set of information.

        • results.words.confidence number(double)

          The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

          Minimum value is 0, maximum value is 1.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 415 object

    The media format of the requested data is not supported by the server, so the server is rejecting the request.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
POST https://api.amerandish.com/v1/speech/asr
Example request
$ curl \ -X POST https://api.amerandish.com/v1/speech/asr \ -H "Content-Type: application/json" \ -d '{"config":{"audioEncoding":"LINEAR16","sampleRateHertz":42,"languageCode":"fa","maxAlternatives"...}'
Example response (201)
{ "transcriptionId": "string", "duration": 42.0, "inferenceTime": 42.0, "status": "queued", "results": [ { "transcript": "string", "confidence": 42.0, "words": [ { "startTime": 42.0, "endTime": 42.0, "word": "string", "confidence": 42.0 } ] } ] }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (415)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

POST /speech/asrlongrunning

Performs asynchronous speech recognition


This resource receives a uri containing the audio resource, download it and transcribes the audio using state-of-the-art deep neural networks. It performs asynchronous speech recognition and the result will be availble using transcription endpoint. This endpoint is designed for transcription of long audio files upto 240 minute.


Using config object you can can specify audio configs such as audioEncoding and sampleRateHertz. We will support different languages so you can choose the languageCode. Using asrModel and languageModel in config you can use customized models.
Refer to WebSocket API for speech recognition with streams.
Refer to ASR API for fast recognition for short audio files.

Body
  • config Required / object

    Provides information to the recognizer that specifies how to process the request.

    • config.audioEncoding Required / string

      Encoding of audio data sent in all RecognitionAudio messages. In case of voice synthesize, this is the format of the requested audio byte stream. This field is required for all audio formats.

      Values are LINEAR16, FLAC, and MP3.

    • config.sampleRateHertz Required / integer

      Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that is not possible, use the native sample rate of the audio source (instead of re-sampling). This field is required for all audio formats. In Text to Speech endpoint is the synthesis sample rate (in hertz) for audio and Optional. If this is different from the voice's natural sample rate, then the synthesizer will honor this request by converting to the desired sample rate (which might result in worse audio quality), unless the specified sample rate is not supported for the encoding chosen.

    • config.languageCode Required / string

      The language of the supplied audio as a language tag. Example en for english language. See Language Support for a list of the currently supported language codes.

      Values are fa and en.

    • config.maxAlternatives integer

      Optional Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of SpeechRecognitionAlternative messages within each SpeechRecognitionResult. The server may return fewer than maxAlternatives. Valid values are 1-5. A value of 0 or 1 will return a maximum of one. If omitted, will return a maximum of one.

      Minimum value is 0, maximum value is 5.

    • config.profanityFilter boolean

      Optional If set to true, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "s***". If set to false or omitted, profanities will not be filtered out.

    • config.asrModel string

      Optional Which model to select for the given request. Select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the parameters in the RecognitionConfig.

      Model Description
      default Best for audio that is not one of the specific audio models. For example, long-form audio. Ideally the audio is high-fidelity, recorded at a 16khz or greater sampling rate.
      video Best for audio that originated from from video or includes multiple speakers. Ideally the audio is recorded at a 16khz or greater sampling rate.
      commandandsearch Best for short queries such as voice commands or voice search. To be released.
      phone_call Best for audio that originated from a phone call (typically recorded at an 8khz sampling rate). To be released.

      Values are default, video, command_and_search, and phone_call.

    • config.languageModel string

      This is the language model id of a customized trained language model. You can train your own language models and then use them to recognize speech. Refer to languagemodel/train for more info.

      There are some pretrained language models which you can use.

      Model Description
      general Best for audio content that is not one of the specific language models. This is the default language model and if you are not sure which one to use, simply use 'general'.
      numbers Best for audio content that contains only spoken numbers. For examble this language model can be used for speech enabled number input fileds.
      yesno Best for audio content that contains yes or no. For examble this language model can be used to receive confirmation from user.
      country Best for audio content that contains only spoken country. For examble this language model can be used for speech enabled input fileds.
      city Best for audio content that contains only spoken city. For examble this language model

      can be used for speech enabled input fileds.
      career | Best for audio content that contains only spoken career names. For examble this language model can be used for speech enabled input fileds.

  • audio Required / object

    Contains audio source URI with the encoding specified in the RecognitionConfig.

    For asrlongrunning endpoint only uri is accepted.

    Property Description
    uri URI that points to a file that contains audio data bytes as specified in RecognitionConfig. The file must not be compressed (for example, gzip).
    • audio.uri Required / string

      URI that points to a file that contains audio data bytes as specified in RecognitionConfig. The file must not be compressed (for example, gzip).

Responses
  • 201 object

    OK. Transcription Generated.

    • transcriptionId string(uuid)

      A UUID string specifying a unique pair of audio and recognitionResult. It can be used to retrieve this recognitionResult using transcription endpoint. asrLongRunning recognitionResult will only be available using transcription endpoint and this transcriptionId.

    • duration number(double)

      File duration in seconds.

    • inferenceTime number(double)

      Total inference time in seconds.

    • status string

      Status of the recognition process. USE THE RECOGNITION RESULT ONLY WHEN STATUS IS DONE.

      Values are queued, processing, done, and partial.

    • results array[object]

      Sequential list of transcription results corresponding to sequential portions of audio. May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

      • results.transcript string

        A UTF8-Encoded string. Transcript text representing the words that the user spoke.

      • results.confidence number(double)

        The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        Minimum value is 0, maximum value is 1.

      • results.words array[object]
        • results.words.startTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.endTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.word string

          The word corresponding to this set of information.

        • results.words.confidence number(double)

          The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

          Minimum value is 0, maximum value is 1.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 415 object

    The media format of the requested data is not supported by the server, so the server is rejecting the request.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
POST https://api.amerandish.com/v1/speech/asrlongrunning
Example request
$ curl \ -X POST https://api.amerandish.com/v1/speech/asrlongrunning \ -H "Content-Type: application/json" \ -d '{"config":{"audioEncoding":"LINEAR16","sampleRateHertz":42,"languageCode":"fa","maxAlternatives"...}'
Example response (201)
{ "transcriptionId": "string", "duration": 42.0, "inferenceTime": 42.0, "status": "queued", "results": [ { "transcript": "string", "confidence": 42.0, "words": [ { "startTime": 42.0, "endTime": 42.0, "word": "string", "confidence": 42.0 } ] } ] }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (415)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

GET /speech/asrlive

Performs asynchronous live speech recognition using websocket


This resource establish a websocket with client and receives audio data using websocket. It will start transcribing the audio using state-of-the-art deep neural networks and returns the partial results on the websocket.
This endpoint is designed for transcription of stream audio data upto 15 minute. It will send back partial (status=partial) result everytime it transcribes an endpoint. After client sends the close signal, it will receive a ASRResponseBody with status=done.
Token should be passed in query string as jwt.


Using config object you can can specify audio configs such as audioEncoding and sampleRateHertz. We will support different languages so you can choose the languageCode. Using asrModel and languageModel in config you can use customized models.
Refer to ASRLongRuning API for long audio speech recognition.
Refer to ASR API for fast recognition for short audio files.

Responses
  • 200 object

    OK.

    • transcriptionId string(uuid)

      A UUID string specifying a unique pair of audio and recognitionResult. It can be used to retrieve this recognitionResult using transcription endpoint. asrLongRunning recognitionResult will only be available using transcription endpoint and this transcriptionId.

    • duration number(double)

      File duration in seconds.

    • inferenceTime number(double)

      Total inference time in seconds.

    • status string

      Status of the recognition process. USE THE RECOGNITION RESULT ONLY WHEN STATUS IS DONE.

      Values are queued, processing, done, and partial.

    • results array[object]

      Sequential list of transcription results corresponding to sequential portions of audio. May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

      • results.transcript string

        A UTF8-Encoded string. Transcript text representing the words that the user spoke.

      • results.confidence number(double)

        The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        Minimum value is 0, maximum value is 1.

      • results.words array[object]
        • results.words.startTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.endTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.word string

          The word corresponding to this set of information.

        • results.words.confidence number(double)

          The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

          Minimum value is 0, maximum value is 1.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 415 object

    The media format of the requested data is not supported by the server, so the server is rejecting the request.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/speech/asrlive
Example request
$ curl \ -X GET https://api.amerandish.com/v1/speech/asrlive \ -H "Content-Type: application/json"
Example response (200)
{ "transcriptionId": "string", "duration": 42.0, "inferenceTime": 42.0, "status": "queued", "results": [ { "transcript": "string", "confidence": 42.0, "words": [ { "startTime": 42.0, "endTime": 42.0, "word": "string", "confidence": 42.0 } ] } ] }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (415)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

GET /speech/transcriptions/{transcriptionId}

Transcription endpoint enable us to retrieve a previous speech recognition result or inform us on a long running speech recognition status. To access a speech recognition result transcriptionId should be provided.


URL parameters
  • transcriptionId Required / string(uuid)

    Id of the transcribed audio. It is a UUID string provided in the speech recognition result.

Responses
  • 200 object

    OK. Transcription Retrieved.

    • transcriptionId string(uuid)

      A UUID string specifying a unique pair of audio and recognitionResult. It can be used to retrieve this recognitionResult using transcription endpoint. asrLongRunning recognitionResult will only be available using transcription endpoint and this transcriptionId.

    • duration number(double)

      File duration in seconds.

    • inferenceTime number(double)

      Total inference time in seconds.

    • status string

      Status of the recognition process. USE THE RECOGNITION RESULT ONLY WHEN STATUS IS DONE.

      Values are queued, processing, done, and partial.

    • results array[object]

      Sequential list of transcription results corresponding to sequential portions of audio. May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

      • results.transcript string

        A UTF8-Encoded string. Transcript text representing the words that the user spoke.

      • results.confidence number(double)

        The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        Minimum value is 0, maximum value is 1.

      • results.words array[object]
        • results.words.startTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.endTime number(double)

          Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This is an experimental feature and the accuracy of the time offset can vary. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

        • results.words.word string

          The word corresponding to this set of information.

        • results.words.confidence number(double)

          The confidence of ASR engine for generated output. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. It is the total confidence of recognition in transcript level and each word confidence in word info object. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

          Minimum value is 0, maximum value is 1.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 404 object

    Server can not find requested resource.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/speech/transcriptions/{transcriptionId}
Example request
$ curl \ -X GET https://api.amerandish.com/v1/speech/transcriptions/{transcriptionId} \ -H "Content-Type: application/json"
Example response (200)
{ "transcriptionId": "string", "duration": 42.0, "inferenceTime": 42.0, "status": "queued", "results": [ { "transcript": "string", "confidence": 42.0, "words": [ { "startTime": 42.0, "endTime": 42.0, "word": "string", "confidence": 42.0 } ] } ] }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (404)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

DELETE /speech/transcriptions/{transcriptionId}

Deletes a transcription for a previous file using transcriptionId.


URL parameters
  • transcriptionId Required / string(uuid)

    Id of the transcribed audio. It is a UUID string provided in the speech recognition result.

Responses
  • 204

    The resource was deleted successfully.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 404 object

    Server can not find requested resource.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
DELETE https://api.amerandish.com/v1/speech/transcriptions/{transcriptionId}
Example request
$ curl \ -X DELETE https://api.amerandish.com/v1/speech/transcriptions/{transcriptionId} \ -H "Content-Type: application/json"
Example response (204)
No content
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (404)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

Languagemodel

GET /speech/languagemodels

Returns list of user available language models. Each user can access general language models plus their own custom trained language models.


Responses
  • 200 array[object]

    OK. List of Language Models Retrieved.

    • languageModelId string

      This is the language model id of a customized trained language model. You can train your own language models and then use them to recognize speech. Refer to languagemodel/train for more info.

      There are some pretrained language models which you can use.

      Model Description
      general Best for audio content that is not one of the specific language models. This is the default language model and if you are not sure which one to use, simply use 'general'.
      numbers Best for audio content that contains only spoken numbers. For examble this language model can be used for speech enabled number input fileds.
      yesno Best for audio content that contains yes or no. For examble this language model can be used to receive confirmation from user.
      country Best for audio content that contains only spoken country. For examble this language model can be used for speech enabled input fileds.
      city Best for audio content that contains only spoken city. For examble this language model

      can be used for speech enabled input fileds.
      career | Best for audio content that contains only spoken career names. For examble this language model can be used for speech enabled input fileds.

    • name string

      The name of the custom language model.

    • status string

      Status of the language model training process. After a language model training finishes it can be used for speech recognition using its languageModelId.

      Values are queued, training, and trained.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 404 object

    Server can not find requested resource.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/speech/languagemodels
Example request
$ curl \ -X GET https://api.amerandish.com/v1/speech/languagemodels \ -H "Content-Type: application/json"
Example response (200)
[ { "languageModelId": "string", "name": "string", "status": "queued" } ]
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (404)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

POST /speech/languagemodels

Train a custom language model using pharases provided by user. Returning a languageModelId for accessing the language model later and using this custom language model to transcribe audios. Using custom language models will boost accuracy for specific keywords/phrases and can be used for a domain-specific speech recognition.


Body
  • corpora Required / array[string]
  • name string

    The name of the custom language model being created.

Responses
  • 201 object

    OK. Processing and Training Language Model.

    • languageModelId string

      This is the language model id of a customized trained language model. You can train your own language models and then use them to recognize speech. Refer to languagemodel/train for more info.

      There are some pretrained language models which you can use.

      Model Description
      general Best for audio content that is not one of the specific language models. This is the default language model and if you are not sure which one to use, simply use 'general'.
      numbers Best for audio content that contains only spoken numbers. For examble this language model can be used for speech enabled number input fileds.
      yesno Best for audio content that contains yes or no. For examble this language model can be used to receive confirmation from user.
      country Best for audio content that contains only spoken country. For examble this language model can be used for speech enabled input fileds.
      city Best for audio content that contains only spoken city. For examble this language model

      can be used for speech enabled input fileds.
      career | Best for audio content that contains only spoken career names. For examble this language model can be used for speech enabled input fileds.

    • name string

      The name of the custom language model.

    • status string

      Status of the language model training process. After a language model training finishes it can be used for speech recognition using its languageModelId.

      Values are queued, training, and trained.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
POST https://api.amerandish.com/v1/speech/languagemodels
Example request
$ curl \ -X POST https://api.amerandish.com/v1/speech/languagemodels \ -H "Content-Type: application/json" \ -d '{"corpora":["string"],"name":"string"}'
Example response (201)
{ "languageModelId": "string", "name": "string", "status": "queued" }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

GET /speech/languagemodels/{languageModelId}

Retrieving the status of a language model with specified languageModelId. A language model is ready to use when its status is trained.


URL parameters
  • languageModelId Required / string

    Id of the language model.

Responses
  • 200 object

    OK. Language Model Retrieved.

    • languageModelId string

      This is the language model id of a customized trained language model. You can train your own language models and then use them to recognize speech. Refer to languagemodel/train for more info.

      There are some pretrained language models which you can use.

      Model Description
      general Best for audio content that is not one of the specific language models. This is the default language model and if you are not sure which one to use, simply use 'general'.
      numbers Best for audio content that contains only spoken numbers. For examble this language model can be used for speech enabled number input fileds.
      yesno Best for audio content that contains yes or no. For examble this language model can be used to receive confirmation from user.
      country Best for audio content that contains only spoken country. For examble this language model can be used for speech enabled input fileds.
      city Best for audio content that contains only spoken city. For examble this language model

      can be used for speech enabled input fileds.
      career | Best for audio content that contains only spoken career names. For examble this language model can be used for speech enabled input fileds.

    • name string

      The name of the custom language model.

    • status string

      Status of the language model training process. After a language model training finishes it can be used for speech recognition using its languageModelId.

      Values are queued, training, and trained.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 404 object

    Server can not find requested resource.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/speech/languagemodels/{languageModelId}
Example request
$ curl \ -X GET https://api.amerandish.com/v1/speech/languagemodels/{languageModelId} \ -H "Content-Type: application/json"
Example response (200)
{ "languageModelId": "string", "name": "string", "status": "queued" }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (404)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

Voice

GET /voice/healthcheck

voice health check endpoint.


This endpoint will return a simple json including service status and API version.

Responses
  • 200 object

    OK.

    • status string

      Values are Running, Warnings, and Critical.

    • message string

      Health check message. Returns OK if running without problem.

    • version string

      API version.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/voice/healthcheck
Example request
$ curl \ -X GET https://api.amerandish.com/v1/voice/healthcheck \ -H "Content-Type: application/json"
Example response (200)
{ "status": "Running", "message": "string", "version": "string" }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

POST /voice/tts

Synthesizes speech synchronously


Receives text and data configs and synthesize speech in different voices and format using state-of-the-art deep neural networks. This service synthesizes speech synchronously and the results will be available after all text input has been processed.


Using config object you can can specify audio configs such as audioEncoding and sampleRateHertz. We will support different languages so you can choose the languageCode. using voiceSelectionParams you can choose between the supported voices with specifying voiceId. Voices differ in gender, tone and style.

Body
  • synthesisInput Required / object

    The Synthesizer requires either plain text or SSML as input. Only provide text OR ssml. Providing both will result in a bad request response.

    • synthesisInput.text string

      The raw text to be synthesized.

    • synthesisInput.ssml string

      The SSML document to be synthesized. The SSML document must be valid and well-formed.

  • voiceConfig Required / object

    The desired voice of the synthesized audio.

    • voiceConfig.voiceId Required / string(uuid)

      id of the desired voice to synthesize.

    • voiceConfig.languageCode string

      The language of the supplied audio as a language tag. Example en for english language. See Language Support for a list of the currently supported language codes.

      Values are fa and en.

    • voiceConfig.name string

      Name of the desired voice.

    • voiceConfig.gender string

      The gender of the requested voice to synthesize.

      Values are male and female.

  • audioConfig Required / object

    The configuration of the synthesized audio.

    • audioConfig.audioEncoding Required / string

      Encoding of audio data sent in all RecognitionAudio messages. In case of voice synthesize, this is the format of the requested audio byte stream. This field is required for all audio formats.

      Values are LINEAR16, FLAC, and MP3.

    • audioConfig.sampleRateHertz Required / integer

      Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that is not possible, use the native sample rate of the audio source (instead of re-sampling). This field is required for all audio formats. In Text to Speech endpoint is the synthesis sample rate (in hertz) for audio and Optional. If this is different from the voice's natural sample rate, then the synthesizer will honor this request by converting to the desired sample rate (which might result in worse audio quality), unless the specified sample rate is not supported for the encoding chosen.

    • audioConfig.speakingRate number

      Optional speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. If unset(0.0), defaults to the native 1.0 speed. Any other values < 0.25 or > 4.0 will return an error.

      Minimum value is 0.25, maximum value is 4.

    • audioConfig.pitch number

      Optional speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch.

      Minimum value is -20.0, maximum value is 20.0.

    • audioConfig.volumeGainDb number

      Optional volume gain (in dB) of the normal native volume supported by the specific voice, in the range [-96.0, 16.0]. If unset, or set to a value of 0.0 (dB), will play at normal native signal amplitude. A value of -6.0 (dB) will play at approximately half the amplitude of the normal native signal amplitude. A value of +6.0 (dB) will play at approximately twice the amplitude of the normal native signal amplitude. Strongly recommend not to exceed +10 (dB) as there's usually no effective increase in loudness for any value greater than that.

      Minimum value is -96.0, maximum value is 16.0.

Responses
  • 201 object

    OK. Speech Synthesized.

    • audioContent string(byte)

      The audio data bytes encoded as specified in the request, including the header (For LINEAR16 audio, we include the WAV header).
      A base64-encoded string.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 415 object

    The media format of the requested data is not supported by the server, so the server is rejecting the request.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
POST https://api.amerandish.com/v1/voice/tts
Example request
$ curl \ -X POST https://api.amerandish.com/v1/voice/tts \ -H "Content-Type: application/json" \ -d '{"synthesisInput":{"text":"string","ssml":"string"},"voiceConfig":{"voiceId":"string","languageC...}'
Example response (201)
{ "audioContent": "string" }
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (415)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }

GET /voice/speakers

This endpoint retrieves the list of available speakers for speech synthesization. Each speaker has a unique voiceId which can be used to synthesize speech. The result aslo includes each speaker langauge, gender and name.


Responses
  • 200 array[object]

    OK. TTS Voices List Retrieved.

    • voiceId string(uuid)

      id of the desired voice to synthesize.

    • languageCode string

      The language of the supplied audio as a language tag. Example en for english language. See Language Support for a list of the currently supported language codes.

      Values are fa and en.

    • name string

      Name of the desired voice.

    • gender string

      The gender of the requested voice to synthesize.

      Values are male and female.

  • 400 object

    This response means that server could not understand the request due to invalid syntax.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 401 object

    Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 403 object

    Client does not have access rights to the content so server is rejecting to give proper response.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 405 object

    The request method is known by the server but has been disabled and cannot be used.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 429 object

    The user has sent too many requests in a given amount of time ("rate limiting").

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 500 object

    The server has encountered a situation it doesn't know how to handle.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

  • 501 object

    The request method is not supported by the server and cannot be handled.

    • status Required / string

      HTTP response status code.

    • detail Required / string

      Message explaining the issue.

    • title string

      Error message title.

    • type string

      Error type.

Definition
GET https://api.amerandish.com/v1/voice/speakers
Example request
$ curl \ -X GET https://api.amerandish.com/v1/voice/speakers \ -H "Content-Type: application/json"
Example response (200)
[ { "voiceId": "string", "languageCode": "fa", "name": "string", "gender": "male" } ]
Example response (400)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (401)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (403)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (405)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (429)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (500)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }
Example response (501)
{ "status": "string", "detail": "string", "title": "string", "type": "string" }