POST /voice/tts

Synthesizes speech synchronously

Receives text and data configs and synthesize speech in different voices and format using state-of-the-art deep neural networks. This service synthesizes speech synchronously and the results will be available after all text input has been processed.

Using config object you can can specify audio configs such as audioEncoding and sampleRateHertz. We will support different languages so you can choose the languageCode. using voiceSelectionParams you can choose between the supported voices with specifying voiceId. Voices differ in gender, tone and style.

Body Required

Receives a json including input text, voice parameteres and audio config.

synthesisInput object Required

The Synthesizer requires either plain text or SSML as input. Only provide text OR ssml. Providing both will result in a bad request response.
Hide synthesisInput attributes Show synthesisInput attributes object
- text string
  
  The raw text to be synthesized.
- ssml string
  
  The SSML document to be synthesized. The SSML document must be valid and well-formed.
voiceConfig object Required

The desired voice of the synthesized audio.
Hide voiceConfig attributes Show voiceConfig attributes object
- voiceId string(uuid) Required
  
  id of the desired voice to synthesize.
- languageCode string
  
  The language of the supplied audio as a language tag. Example en for english language. See Language Support for a list of the currently supported language codes.
  
  Values are fa or en.
- name string
  
  Name of the desired voice.
- gender string
  
  The gender of the requested voice to synthesize.
  
  Values are male or female. Default value is female.
audioConfig object Required

The configuration of the synthesized audio.
Hide audioConfig attributes Show audioConfig attributes object
- audioEncoding string Required
  
  Encoding of audio data sent in all RecognitionAudio messages. In case of voice synthesize, this is the format of the requested audio byte stream. This field is required for all audio formats.
  
  Values are LINEAR16, FLAC, or MP3. Default value is LINEAR16.
- sampleRateHertz integer Required
  
  Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that is not possible, use the native sample rate of the audio source (instead of re-sampling). This field is required for all audio formats. In Text to Speech endpoint is the synthesis sample rate (in hertz) for audio and Optional. If this is different from the voice's natural sample rate, then the synthesizer will honor this request by converting to the desired sample rate (which might result in worse audio quality), unless the specified sample rate is not supported for the encoding chosen.
  
  Default value is 16000.
- speakingRate number
  
  Optional speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. If unset(0.0), defaults to the native 1.0 speed. Any other values < 0.25 or > 4.0 will return an error.
  
  Minimum value is 0.25, maximum value is 4.
- pitch number
  
  Optional speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch.
  
  Minimum value is -20.0, maximum value is 20.0.
- volumeGainDb number
  
  Optional volume gain (in dB) of the normal native volume supported by the specific voice, in the range [-96.0, 16.0]. If unset, or set to a value of 0.0 (dB), will play at normal native signal amplitude. A value of -6.0 (dB) will play at approximately half the amplitude of the normal native signal amplitude. A value of +6.0 (dB) will play at approximately twice the amplitude of the normal native signal amplitude. Strongly recommend not to exceed +10 (dB) as there's usually no effective increase in loudness for any value greater than that.
  
  Minimum value is -96.0, maximum value is 16.0.

Responses

405

The request method is known by the server but has been disabled and cannot be used.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
201

OK. Speech Synthesized.
Hide response attribute Show response attribute object
- audioContent string(byte)
  
  The audio data bytes encoded as specified in the request, including the header (For LINEAR16 audio, we include the WAV header). A base64-encoded string.
400

This response means that server could not understand the request due to invalid syntax.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
401

Authentication is needed to get requested response. This is similar to 403, but in this case, authentication is possible.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
403

Client does not have access rights to the content so server is rejecting to give proper response.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
415

The media format of the requested data is not supported by the server, so the server is rejecting the request.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
429

The user has sent too many requests in a given amount of time ("rate limiting").
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
500

The server has encountered a situation it doesn't know how to handle.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.
501

The request method is not supported by the server and cannot be handled.
Hide response attributes Show response attributes object
- status string Required
  
  HTTP response status code.
- detail string Required
  
  Message explaining the issue.
- title string
  
  Error message title.
- type string
  
  Error type.

POST /voice/tts

curl \
 -X POST https://api.amerandish.com/v1/voice/tts \
 -H "Authorization: Bearer $ACCESS_TOKEN" \
 -H "Content-Type: application/json" \
 -d '{"synthesisInput":{"text":"Speak This."},"voiceConfig":{"languageCode":"fa","voiceId":"b2d8dfca-7d78-47f8-b976-c85b15bbc134","name":"speaker-2","gender":"female"},"audioConfig":{"audioEncoding":"MP3","speakingRate":1.2,"pitch":0.0,"volumeGainDb":-2,"sampleRateHertz":16000}}'

Request example

{
  "synthesisInput": {
    "text": "Speak This."
  },
  "voiceConfig": {
    "languageCode": "fa",
    "voiceId": "b2d8dfca-7d78-47f8-b976-c85b15bbc134",
    "name": "speaker-2",
    "gender": "female"
  },
  "audioConfig": {
    "audioEncoding": "MP3",
    "speakingRate": 1.2,
    "pitch": 0.0,
    "volumeGainDb": -2,
    "sampleRateHertz": 16000
  }
}

Request examples

{
  "synthesisInput": {
    "text": "string",
    "ssml": "string"
  },
  "voiceConfig": {
    "voiceId": "string",
    "languageCode": "fa",
    "name": "string",
    "gender": "female"
  },
  "audioConfig": {
    "audioEncoding": "LINEAR16",
    "sampleRateHertz": 16000,
    "speakingRate": 42.0,
    "pitch": 42.0,
    "volumeGainDb": 42.0
  }
}

Response examples (405)

{
  "code": 405,
  "message": "Method Not Allowed."
}

Response examples (405)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (201)

{
  "audioContent": "string"
}

Response examples (201)

{
  "audioContent": "string"
}

Response examples (400)

{
  "code": 400,
  "message": "Bad Request. Invalid JSON object."
}

Response examples (400)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (401)

{
  "code": 401,
  "message": "Unautherized. Invalid Authorization Token."
}

Response examples (401)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (403)

{
  "code": 403,
  "message": "Forbidden. Do not have access right to resource."
}

Response examples (403)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (415)

{
  "code": 415,
  "message": "Unsupported Media Type. Please change requested media type."
}

Response examples (415)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (429)

{
  "code": 429,
  "message": "Too Many Requests. Your request is blocked due to exceeding rate limiting."
}

Response examples (429)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (500)

{
  "code": 500,
  "message": "Internal Server Error. Please retry later."
}

Response examples (500)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}

Response examples (501)

{
  "code": 501,
  "message": "Not Implemented. This functionality is not implemented yet."
}

Response examples (501)

{
  "status": "string",
  "detail": "string",
  "title": "string",
  "type": "string"
}