Emitted when there is an additional text delta. This is also the first event emitted when the transcription starts. Only emitted when you create a transcription with the Stream parameter set to true.
      
  
    
  
        Body
      
    Required
 
    
  
  - 
    
  
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
 - 
    
  
The language of the input audio. Supplying the input language in ISO-639-1 (e.g.
en) format will improve accuracy and latency. - 
    
  
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
 - 
    
  
The format of the output, in one of these options:
json,text,srt,verbose_json, orvtt. Forgpt-4o-transcribeandgpt-4o-mini-transcribe, the only supported format isjson.Values are
json,text,srt,verbose_json, orvtt. Default value isjson. - 
    
  
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
Default value is
0. - 
    
  
Additional information to include in the transcription response.
logprobswill return the log probabilities of the tokens in the response to understand the model's confidence in the transcription.logprobsonly works with response_format set tojsonand only with the modelsgpt-4o-transcribeandgpt-4o-mini-transcribe.Value is
logprobs. Default value is[](empty). - 
    
  
The timestamp granularities to populate for this transcription.
response_formatmust be setverbose_jsonto use timestamp granularities. Either or both of these options are supported:word, orsegment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.Values are
wordorsegment. Default value is["segment"]. - 
    
  
If set to true, the model response data will be streamed to the client as it is generated using server-sent events. See the Streaming section of the Speech-to-Text guide for more information.
Note: Streaming is not supported for the
whisper-1model and will be ignored.Default value is
false. 
Responses
- 
      
      
        
OK
Any of:  
curl \
 --request POST 'https://api.openai.com/v1/audio/transcriptions' \
 --header "Authorization: Bearer $ACCESS_TOKEN" \
 --header "Content-Type: multipart/form-data" \
 --form "file=@file" \
 --form "model=gpt-4o-transcribe" \
 --form "language=string" \
 --form "prompt=string" \
 --form "response_format=json" \
 --form "temperature=0" \
 --form "timestamp_granularities[][]=segment" \
 --form "stream=false"
    {
  "text": "string",
  "logprobs": [
    {
      "token": "string",
      "logprob": 42.0,
      "bytes": [
        42
      ]
    }
  ]
}
{
  "language": "string",
  "duration": 42.0,
  "text": "string",
  "words": [
    {
      "word": "string",
      "start": 42.0,
      "end": 42.0
    }
  ],
  "segments": [
    {
      "id": 42,
      "seek": 42,
      "start": 42.0,
      "end": 42.0,
      "text": "string",
      "tokens": [
        42
      ],
      "temperature": 42.0,
      "avg_logprob": 42.0,
      "compression_ratio": 42.0,
      "no_speech_prob": 42.0
    }
  ]
}