Create transcription

POST /audio/transcriptions

Transcribes audio into the input language.

multipart/form-data

Body Required

file string(binary) Required

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
model string Required

Any of:
string-1 string string-2 string

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

Value is whisper-1.
language string

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
prompt string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
response_format string

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

Values are json, text, srt, verbose_json, or vtt. Default value is json.
temperature number

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Default value is 0.
timestamp_granularities[] array[string]

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

Values are word or segment. Default value is ["segment"].
Additional properties are NOT allowed

Responses

200 application/json

OK
One of:
CreateTranscriptionResponseJson object CreateTranscriptionResponseVerboseJson object
Represents a transcription response returned by model, based on the provided input.

Hide attribute Show attribute

text string Required

The transcribed text.
Represents a verbose json transcription response returned by model, based on the provided input.

Hide attributes Show attributes

language string Required

The language of the input audio.

duration string Required

The duration of the input audio.

text string Required

The transcribed text.

words array[object]

Extracted words and their corresponding timestamps.

Hide words attributes Show words attributes object

word string Required

The text content of the word.

start number(float) Required

Start time of the word in seconds.

end number(float) Required

End time of the word in seconds.

segments array[object]

Segments of the transcribed text and their corresponding details.

Hide segments attributes Show segments attributes object

id integer Required

Unique identifier of the segment.

seek integer Required

Seek offset of the segment.

start number(float) Required

Start time of the segment in seconds.

end number(float) Required

End time of the segment in seconds.

text string Required

Text content of the segment.

tokens array[integer] Required

Array of token IDs for the text content.

temperature number(float) Required

Temperature parameter used for generating the segment.

avg_logprob number(float) Required

Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

compression_ratio number(float) Required

Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

no_speech_prob number(float) Required

Probability of no speech in the segment. If the value is higher than 1.0 and the avg_logprob is below -1, consider this segment silent.

POST /audio/transcriptions

curl \
 -X POST https://api.openai.com/v1/audio/transcriptions \
 -H "Authorization: Bearer $ACCESS_TOKEN" \
 -H "Content-Type: multipart/form-data" \
 -F "file=@file" \
 -F "model=whisper-1" \
 -F "language=string" \
 -F "prompt=string" \
 -F "response_format=json" \
 -F "temperature=0" \
 -F "timestamp_granularities[][]=segment"

Response examples (200)

{
  "text": "string"
}

{
  "language": "string",
  "duration": "string",
  "text": "string",
  "words": [
    {
      "word": "string",
      "start": 42.0,
      "end": 42.0
    }
  ],
  "segments": [
    {
      "id": 42,
      "seek": 42,
      "start": 42.0,
      "end": 42.0,
      "text": "string",
      "tokens": [
        42
      ],
      "temperature": 42.0,
      "avg_logprob": 42.0,
      "compression_ratio": 42.0,
      "no_speech_prob": 42.0
    }
  ]
}

Create transcription

Body Required

model string Required

Responses