Translates audio into English.

POST /audio/translations

Api key auth

multipart/form-data

Body Required

file string(binary) Required

The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
model string Required

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

Any of:
string-1 string string-2 string

Value is whisper-1.
prompt string

An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.
response_format string

The format of the output, in one of these options: json, text, srt, verbose_json, or vtt.

Values are json, text, srt, verbose_json, or vtt. Default value is json.
temperature number

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Default value is 0.

Responses

200 application/json

OK
One of:
CreateTranslationResponseJson object CreateTranslationResponseVerboseJson object
Hide attribute Show attribute

text string Required
Hide attributes Show attributes

language string Required

The language of the output translation (always english).

duration number Required

The duration of the input audio.

text string Required

The translated text.

segments array[object]

Segments of the translated text and their corresponding details.

Hide segments attributes Show segments attributes object

id integer Required

Unique identifier of the segment.

seek integer Required

Seek offset of the segment.

start number(float) Required

Start time of the segment in seconds.

end number(float) Required

End time of the segment in seconds.

text string Required

Text content of the segment.

tokens array[integer] Required

Array of token IDs for the text content.

temperature number(float) Required

Temperature parameter used for generating the segment.

avg_logprob number(float) Required

Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

compression_ratio number(float) Required

Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

no_speech_prob number(float) Required

Probability of no speech in the segment. If the value is higher than 1.0 and the avg_logprob is below -1, consider this segment silent.

POST /audio/translations

curl \
 --request POST 'https://api.openai.com/v1/audio/translations' \
 --header "Authorization: Bearer $ACCESS_TOKEN" \
 --header "Content-Type: multipart/form-data" \
 --form "file=@file" \
 --form "model=whisper-1" \
 --form "prompt=string" \
 --form "response_format=json" \
 --form "temperature=0"

Response examples (200)

{
  "text": "string"
}

{
  "language": "string",
  "duration": 42.0,
  "text": "string",
  "segments": [
    {
      "id": 42,
      "seek": 42,
      "start": 42.0,
      "end": 42.0,
      "text": "string",
      "tokens": [
        42
      ],
      "temperature": 42.0,
      "avg_logprob": 42.0,
      "compression_ratio": 42.0,
      "no_speech_prob": 42.0
    }
  ]
}

Translates audio into English.

Body Required

model string Required

Responses