> ## Documentation Index
> Fetch the complete documentation index at: https://developers.deepl.com/llms.txt
> Use this file to discover all available pages before exploring further.

# WebSocket Streaming

> WebSocket channel for streaming audio and receiving transcriptions and translations. Messages are exchanged in JSON or MessagePack format. WebSocket messages are exchanged in TEXT frames when using JSON format and in BINARY frames when using MessagePack format. Sending the wrong frame type will result in connection errors.




## AsyncAPI

````yaml voiceStream
id: voiceStream
title: Voice stream
description: >
  WebSocket channel for streaming audio and receiving transcriptions and
  translations. Messages are exchanged in JSON or MessagePack format. WebSocket
  messages are exchanged in TEXT frames when using JSON format and in BINARY
  frames when using MessagePack format. Sending the wrong frame type will result
  in connection errors.
servers:
  - id: production
    protocol: wss
    host: api.deepl.com
    bindings: []
    variables:
      - id: token
        description: This is the ephemeral authentication token obtained from the REST API
        allowedValues: []
        examples:
          - VGhpcyBpcyBhIGZha2UgdG9rZW4K
address: /v3/voice/realtime/connect?token={token}
parameters:
  - id: token
    jsonSchema:
      type: string
      description: >
        This is the ephemeral authentication token obtained from the [Request
        Session](/api-reference/voice/request-session) endpoint. The token is
        valid for one-time use only and must be passed as a query parameter when
        establishing the WebSocket connection.
      examples:
        - VGhpcyBpcyBhIGZha2UgdG9rZW4K
    description: >
      This is the ephemeral authentication token obtained from the [Request
      Session](/api-reference/voice/request-session) endpoint. The token is
      valid for one-time use only and must be passed as a query parameter when
      establishing the WebSocket connection.
    type: string
    required: true
    deprecated: false
bindings: []
operations:
  - &ref_4
    id: sendAudioData
    title: Send audio data
    description: Send audio data to the server
    type: send
    messages:
      - &ref_7
        id: SourceTranscriptUpdate
        contentType: application/json
        payload:
          - name: Source Transcript Update
            description: |2-
               The message contains an update to the transcription of the supplied media in the *source* language. 

               Each message is an incremental addition to the already received updates of the *source* transcript with concluded and tentative text segments. Concluded segments are fixed and will only appear once, while tentative segments may be updated in subsequent messages as more audio is processed. 

               Clients should merge the concluded segments into a final transcript and update the tentative segments as new updates arrive.
            type: object
            properties:
              - name: source_transcript_update
                type: object
                required: true
                properties:
                  - name: concluded
                    type: array
                    description: >
                      Array of fixed transcript segments that will not change
                      anymore. Array objects contain `language` property of type
                      `string` (IETF BCP 47 language tag of the detected source
                      language), `text` property of type `string` (Source or
                      target transcript text), `start_time` property of type
                      `integer` (Estimated start time of the segment in the
                      input stream in milliseconds) and `end_time` property of
                      type `integer` (Estimated end time of the segment in the
                      input stream in milliseconds).
                    required: false
                  - name: tentative
                    type: array
                    description: >
                      Array of preliminary transcript segments that are subject
                      to change. Array objects contain `language` property of
                      type `string` (IETF BCP 47 language tag of the detected
                      source language), `text` property of type `string` (Source
                      or target transcript text), `start_time` property of type
                      `integer` (Estimated start time of the segment in the
                      input stream in milliseconds) and `end_time` property of
                      type `integer` (Estimated end time of the segment in the
                      input stream in milliseconds).
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - source_transcript_update
          properties:
            source_transcript_update:
              type: object
              required:
                - concluded
                - tentative
              properties:
                concluded:
                  type: array
                  description: >
                    Array of fixed transcript segments that will not change
                    anymore. Array objects contain `language` property of type
                    `string` (IETF BCP 47 language tag of the detected source
                    language), `text` property of type `string` (Source or
                    target transcript text), `start_time` property of type
                    `integer` (Estimated start time of the segment in the input
                    stream in milliseconds) and `end_time` property of type
                    `integer` (Estimated end time of the segment in the input
                    stream in milliseconds).
                  items: &ref_0
                    allOf:
                      - &ref_1
                        type: object
                        required:
                          - text
                          - start_time
                          - end_time
                        properties:
                          text:
                            type: string
                            description: Source or target transcript text
                            x-parser-schema-id: <anonymous-schema-7>
                          start_time:
                            type: integer
                            description: >-
                              Estimated start time of the segment in the input
                              stream in milliseconds
                            examples:
                              - 1250
                            x-parser-schema-id: <anonymous-schema-8>
                          end_time:
                            type: integer
                            description: >-
                              Estimated end time of the segment in the input
                              stream in milliseconds
                            examples:
                              - 1570
                            x-parser-schema-id: <anonymous-schema-9>
                        x-parser-schema-id: TranscriptSegment
                      - type: object
                        required:
                          - language
                        properties:
                          language:
                            type: string
                            description: >-
                              IETF BCP 47 language tag of the detected source
                              language
                            examples:
                              - en
                            x-parser-schema-id: <anonymous-schema-11>
                        x-parser-schema-id: <anonymous-schema-10>
                    x-parser-schema-id: SourceTranscriptSegment
                  x-parser-schema-id: <anonymous-schema-6>
                tentative:
                  type: array
                  description: >
                    Array of preliminary transcript segments that are subject to
                    change. Array objects contain `language` property of type
                    `string` (IETF BCP 47 language tag of the detected source
                    language), `text` property of type `string` (Source or
                    target transcript text), `start_time` property of type
                    `integer` (Estimated start time of the segment in the input
                    stream in milliseconds) and `end_time` property of type
                    `integer` (Estimated end time of the segment in the input
                    stream in milliseconds).
                  items: *ref_0
                  x-parser-schema-id: <anonymous-schema-12>
              x-parser-schema-id: <anonymous-schema-5>
          x-parser-schema-id: SourceTranscriptUpdatePayload
        title: Source Transcript Update
        description: |2-
           The message contains an update to the transcription of the supplied media in the *source* language. 

           Each message is an incremental addition to the already received updates of the *source* transcript with concluded and tentative text segments. Concluded segments are fixed and will only appear once, while tentative segments may be updated in subsequent messages as more audio is processed. 

           Clients should merge the concluded segments into a final transcript and update the tentative segments as new updates arrive.
        example: |-
          {
            "source_transcript_update": {
              "concluded": [
                {
                  "language": "en",
                  "text": "Hello, how are you",
                  "start_time": 0,
                  "end_time": 1500
                }
              ],
              "tentative": [
                {
                  "language": "en",
                  "text": " today?",
                  "start_time": 1500,
                  "end_time": 2000
                }
              ]
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: SourceTranscriptUpdate
      - &ref_8
        id: TargetTranscriptUpdate
        contentType: application/json
        payload:
          - name: Target Transcript Update
            description: |2-
               The message contains an update to the transcription of the supplied media in the *target* language. 

               Each message is an incremental addition to the already received updates of the *target* transcript with concluded and tentative text segments. Concluded segments are fixed and will only appear once, while tentative segments may be updated in subsequent messages as more audio is processed. 

               Clients should merge the concluded segments into a final transcript and update the tentative segments as new updates arrive.
            type: object
            properties:
              - name: target_transcript_update
                type: object
                required: true
                properties:
                  - name: language
                    type: string
                    description: IETF BCP 47 language tag of the target language
                    required: false
                  - name: concluded
                    type: array
                    description: >
                      Array of fixed transcript segments that will not change
                      anymore. Array objects contain `text` property of type
                      `string` (Source or target transcript text), `start_time`
                      property of type `integer` (Estimated start time of the
                      segment in the input stream in milliseconds) and
                      `end_time` property of type `integer` (Estimated end time
                      of the segment in the input stream in milliseconds).
                    required: false
                  - name: tentative
                    type: array
                    description: >
                      Array of preliminary transcript segments that are subject
                      to change. Array objects contain `text` property of type
                      `string` (Source or target transcript text), `start_time`
                      property of type `integer` (Estimated start time of the
                      segment in the input stream in milliseconds) and
                      `end_time` property of type `integer` (Estimated end time
                      of the segment in the input stream in milliseconds).
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - target_transcript_update
          properties:
            target_transcript_update:
              type: object
              required:
                - language
                - concluded
                - tentative
              properties:
                language:
                  type: string
                  description: IETF BCP 47 language tag of the target language
                  examples:
                    - es
                  x-parser-schema-id: <anonymous-schema-14>
                concluded:
                  type: array
                  description: >
                    Array of fixed transcript segments that will not change
                    anymore. Array objects contain `text` property of type
                    `string` (Source or target transcript text), `start_time`
                    property of type `integer` (Estimated start time of the
                    segment in the input stream in milliseconds) and `end_time`
                    property of type `integer` (Estimated end time of the
                    segment in the input stream in milliseconds).
                  items: *ref_1
                  x-parser-schema-id: <anonymous-schema-15>
                tentative:
                  type: array
                  description: >
                    Array of preliminary transcript segments that are subject to
                    change. Array objects contain `text` property of type
                    `string` (Source or target transcript text), `start_time`
                    property of type `integer` (Estimated start time of the
                    segment in the input stream in milliseconds) and `end_time`
                    property of type `integer` (Estimated end time of the
                    segment in the input stream in milliseconds).
                  items: *ref_1
                  x-parser-schema-id: <anonymous-schema-16>
              x-parser-schema-id: <anonymous-schema-13>
          x-parser-schema-id: TargetTranscriptUpdatePayload
        title: Target Transcript Update
        description: |2-
           The message contains an update to the transcription of the supplied media in the *target* language. 

           Each message is an incremental addition to the already received updates of the *target* transcript with concluded and tentative text segments. Concluded segments are fixed and will only appear once, while tentative segments may be updated in subsequent messages as more audio is processed. 

           Clients should merge the concluded segments into a final transcript and update the tentative segments as new updates arrive.
        example: |-
          {
            "target_transcript_update": {
              "language": "es",
              "concluded": [
                {
                  "text": "Hola, ¿cómo estás",
                  "start_time": 0,
                  "end_time": 1500
                }
              ],
              "tentative": [
                {
                  "text": " hoy?",
                  "start_time": 1500,
                  "end_time": 2000
                }
              ]
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: TargetTranscriptUpdate
      - &ref_9
        id: TargetMediaChunk
        contentType: application/json
        payload:
          - name: Target Media Chunk
            description: |2-
               (closed beta) The message contains translated audio data in the target language. 

               The audio data is provided as an array of base64-encoded indivisible chunks (e.g., codec packets or container pages/clusters). The first message of this type includes the content type and optional headers field. The `headers` field (when present) indicates how many packets at the start of the `data` array contain initialization/header data required by the decoder. For containerized formats, all packets can be passed directly to the demuxer. For raw codec formats with headers, the header packets must be used to initialize the decoder before processing subsequent audio packets. When `headers` is `null` or absent, all packets in the `data` array are audio data. 

               The audio stream contains only synthesized speech segments, without silence or padding. 

               Clients should decode and play back the audio chunks in the order received and sequence given in `data`. For subtitle synchronization, use the `text` field to identify subtitle segments and accumulate `duration` values to calculate total playback time for each subtitle.
            type: object
            properties:
              - name: target_media_chunk
                type: object
                required: true
                properties:
                  - name: language
                    type: string
                    description: IETF BCP 47 language tag of the target media language
                    required: false
                  - name: content_type
                    type: string
                    description: >-
                      (Optional) MIME type of the audio stream. Only present in
                      the first message of this type.
                    required: false
                  - name: headers
                    type: integer
                    description: >-
                      (Optional) Number of packets at the start of the data
                      array that contain initialization/header data. Only
                      present in the first message of this type. When present,
                      the first N elements in the data array (where N equals the
                      headers value) contain header/initialization data required
                      by the decoder, and subsequent elements contain audio
                      packets. For containerized formats, all packets can be
                      passed directly to the demuxer. For raw codec formats,
                      header packets must be used to initialize the decoder
                      before processing audio packets. When null or absent, all
                      packets are audio data.
                    required: false
                  - name: data
                    type: array
                    description: >-
                      Array of indivisible chunks of audio data (e.g., codec
                      packets or container pages/clusters). Each element is
                      encoded as base64 string when using JSON message format,
                      or raw binary data when using MessagePack message format.
                      When the headers field is present, the first N elements
                      contain header data, and subsequent elements contain audio
                      packets.
                    required: false
                  - name: duration
                    type: integer
                    description: >-
                      The total playback duration of all audio data in this
                      chunk, measured in milliseconds. Accumulate duration
                      values across chunks belonging to the same text segment to
                      determine the total playback time for that subtitle. Also
                      useful for synchronization, buffering calculations, and
                      determining the timing of subsequent chunks.
                    required: false
                  - name: text
                    type: string
                    description: >-
                      (Optional) The target transcript segment from which this
                      audio was synthesized. Present only in the first audio
                      chunk belonging to a new transcript segment. Subsequent
                      audio chunks for the same transcript segment will have
                      this field set to null. Multiple audio chunks can belong
                      to the same text segment. The cumulative content of this
                      field across all chunks matches the cumulative target
                      transcript received via target transcript updates. This
                      allows clients to associate audio chunks with their
                      corresponding transcript segments and display synchronized
                      captions or subtitles during playback.
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - target_media_chunk
          properties:
            target_media_chunk:
              type: object
              required:
                - language
                - data
                - duration
              properties:
                language:
                  type: string
                  description: IETF BCP 47 language tag of the target media language
                  examples:
                    - de
                  x-parser-schema-id: <anonymous-schema-18>
                content_type:
                  type: string
                  description: >-
                    (Optional) MIME type of the audio stream. Only present in
                    the first message of this type.
                  examples:
                    - audio/webm;codecs=opus;
                  x-parser-schema-id: <anonymous-schema-19>
                headers:
                  type: integer
                  description: >-
                    (Optional) Number of packets at the start of the data array
                    that contain initialization/header data. Only present in the
                    first message of this type. When present, the first N
                    elements in the data array (where N equals the headers
                    value) contain header/initialization data required by the
                    decoder, and subsequent elements contain audio packets. For
                    containerized formats, all packets can be passed directly to
                    the demuxer. For raw codec formats, header packets must be
                    used to initialize the decoder before processing audio
                    packets. When null or absent, all packets are audio data.
                  examples:
                    - 1
                  x-parser-schema-id: <anonymous-schema-20>
                data:
                  type: array
                  description: >-
                    Array of indivisible chunks of audio data (e.g., codec
                    packets or container pages/clusters). Each element is
                    encoded as base64 string when using JSON message format, or
                    raw binary data when using MessagePack message format. When
                    the headers field is present, the first N elements contain
                    header data, and subsequent elements contain audio packets.
                  items:
                    type: string
                    format: byte
                    x-parser-schema-id: <anonymous-schema-22>
                  x-parser-schema-id: <anonymous-schema-21>
                duration:
                  type: integer
                  description: >-
                    The total playback duration of all audio data in this chunk,
                    measured in milliseconds. Accumulate duration values across
                    chunks belonging to the same text segment to determine the
                    total playback time for that subtitle. Also useful for
                    synchronization, buffering calculations, and determining the
                    timing of subsequent chunks.
                  examples:
                    - 2400
                  x-parser-schema-id: <anonymous-schema-23>
                text:
                  type: string
                  description: >-
                    (Optional) The target transcript segment from which this
                    audio was synthesized. Present only in the first audio chunk
                    belonging to a new transcript segment. Subsequent audio
                    chunks for the same transcript segment will have this field
                    set to null. Multiple audio chunks can belong to the same
                    text segment. The cumulative content of this field across
                    all chunks matches the cumulative target transcript received
                    via target transcript updates. This allows clients to
                    associate audio chunks with their corresponding transcript
                    segments and display synchronized captions or subtitles
                    during playback.
                  x-parser-schema-id: <anonymous-schema-24>
              x-parser-schema-id: <anonymous-schema-17>
          x-parser-schema-id: TargetMediaChunkPayload
        title: Target Media Chunk
        description: |2-
           (closed beta) The message contains translated audio data in the target language. 

           The audio data is provided as an array of base64-encoded indivisible chunks (e.g., codec packets or container pages/clusters). The first message of this type includes the content type and optional headers field. The `headers` field (when present) indicates how many packets at the start of the `data` array contain initialization/header data required by the decoder. For containerized formats, all packets can be passed directly to the demuxer. For raw codec formats with headers, the header packets must be used to initialize the decoder before processing subsequent audio packets. When `headers` is `null` or absent, all packets in the `data` array are audio data. 

           The audio stream contains only synthesized speech segments, without silence or padding. 

           Clients should decode and play back the audio chunks in the order received and sequence given in `data`. For subtitle synchronization, use the `text` field to identify subtitle segments and accumulate `duration` values to calculate total playback time for each subtitle.
        example: |-
          {
            "target_media_chunk": {
              "language": "de",
              "content_type": "audio/webm;codecs=opus;",
              "headers": 1,
              "data": [
                "GkXfo59ChoEBQveBAULygQRC84EIQoKEd0VFSUgBU0WIQo17hEgBc0WjgQBBxYWIAvLhEKBjYEfA",
                "H0O2dBUMRQyBElkIBE9nZ1MAAgAAAAAAAAAAtJhTXAAAAAAAoyC5AQAAAAA=",
                "H0O2dBUMRQyBElkIBE9nZ1MAAAAAAAAAAAAAtJhTXAIAAAAAamZ0BwE="
              ],
              "duration": 2400,
              "text": "Hallo, wie geht es dir heute?"
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: TargetMediaChunk
      - &ref_10
        id: EndOfSourceTranscript
        contentType: application/json
        payload:
          - name: End of Source Transcript
            description: >
              The message indicates that the *source* transcript is complete and
              no further updates will be sent. It gets emitted after client
              sends End of Source Media.
            type: object
            properties:
              - name: end_of_source_transcript
                type: object
                description: Empty object indicating source transcript is complete
                required: true
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - end_of_source_transcript
          properties:
            end_of_source_transcript:
              type: object
              description: Empty object indicating source transcript is complete
              x-parser-schema-id: <anonymous-schema-25>
          x-parser-schema-id: EndOfSourceTranscriptPayload
        title: End of Source Transcript
        description: >
          The message indicates that the *source* transcript is complete and no
          further updates will be sent. It gets emitted after client sends End
          of Source Media.
        example: |-
          {
            "end_of_source_transcript": {}
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: EndOfSourceTranscript
      - &ref_11
        id: EndOfTargetTranscript
        contentType: application/json
        payload:
          - name: End of Target Transcript
            description: >
              This message indicates that the *target* transcript is complete
              and no further updates will be sent. It gets emitted after client
              sends End of Source Media.
            type: object
            properties:
              - name: end_of_target_transcript
                type: object
                required: true
                properties:
                  - name: language
                    type: string
                    description: >-
                      IETF BCP 47 language tag indicating which target
                      transcript has ended
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - end_of_target_transcript
          properties:
            end_of_target_transcript:
              type: object
              required:
                - language
              properties:
                language:
                  type: string
                  description: >-
                    IETF BCP 47 language tag indicating which target transcript
                    has ended
                  examples:
                    - fr
                  x-parser-schema-id: <anonymous-schema-27>
              x-parser-schema-id: <anonymous-schema-26>
          x-parser-schema-id: EndOfTargetTranscriptPayload
        title: End of Target Transcript
        description: >
          This message indicates that the *target* transcript is complete and no
          further updates will be sent. It gets emitted after client sends End
          of Source Media.
        example: |-
          {
            "end_of_target_transcript": {
              "language": "fr"
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: EndOfTargetTranscript
      - &ref_12
        id: EndOfTargetMedia
        contentType: application/json
        payload:
          - name: End of Target Media
            description: >
              (closed beta) This message indicates that the *target* media
              stream is complete and no further audio chunks will be sent for
              this target language. It gets emitted after client sends End of
              Source Media and all target audio has been sent.
            type: object
            properties:
              - name: end_of_target_media
                type: object
                required: true
                properties:
                  - name: language
                    type: string
                    description: >-
                      IETF BCP 47 language tag indicating which target media
                      stream has ended
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - end_of_target_media
          properties:
            end_of_target_media:
              type: object
              required:
                - language
              properties:
                language:
                  type: string
                  description: >-
                    IETF BCP 47 language tag indicating which target media
                    stream has ended
                  examples:
                    - es
                  x-parser-schema-id: <anonymous-schema-29>
              x-parser-schema-id: <anonymous-schema-28>
          x-parser-schema-id: EndOfTargetMediaPayload
        title: End of Target Media
        description: >
          (closed beta) This message indicates that the *target* media stream is
          complete and no further audio chunks will be sent for this target
          language. It gets emitted after client sends End of Source Media and
          all target audio has been sent.
        example: |-
          {
            "end_of_target_media": {
              "language": "es"
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: EndOfTargetMedia
      - &ref_13
        id: EndOfStream
        contentType: application/json
        payload:
          - name: End of Stream
            description: >
              This message indicates that all outputs are complete and the
              stream ended. It is the very last message the client will receive
              after it sends End of Source Media. You can safely close the
              connection after you received this message.
            type: object
            properties:
              - name: end_of_stream
                type: object
                description: Empty object indicating all outputs are complete
                required: true
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - end_of_stream
          properties:
            end_of_stream:
              type: object
              description: Empty object indicating all outputs are complete
              x-parser-schema-id: <anonymous-schema-30>
          x-parser-schema-id: EndOfStreamPayload
        title: End of Stream
        description: >
          This message indicates that all outputs are complete and the stream
          ended. It is the very last message the client will receive after it
          sends End of Source Media. You can safely close the connection after
          you received this message.
        example: |-
          {
            "end_of_stream": {}
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: EndOfStream
      - &ref_14
        id: Error
        contentType: application/json
        payload:
          - name: Error
            description: >
              This message reports errors encountered during audio processing or
              streaming. It includes an error code, reason code, and a
              human-readable message. After an error, the session is terminated
              and reconnection is not possible. You need to request a new
              session.
            type: object
            properties:
              - name: error
                type: object
                required: true
                properties:
                  - name: request_type
                    type: string
                    description: The type of request that caused the error
                    required: false
                  - name: error_code
                    type: integer
                    description: HTTP-style error code
                    required: false
                  - name: reason_code
                    type: integer
                    description: Detailed reason code for debugging
                    required: false
                  - name: error_message
                    type: string
                    description: Human-readable error description
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - error
          properties:
            error:
              type: object
              required:
                - request_type
                - error_code
                - reason_code
                - error_message
              properties:
                request_type:
                  type: string
                  description: The type of request that caused the error
                  examples:
                    - source_media_chunk
                  x-parser-schema-id: <anonymous-schema-32>
                error_code:
                  type: integer
                  description: HTTP-style error code
                  examples:
                    - 400
                  x-parser-schema-id: <anonymous-schema-33>
                reason_code:
                  type: integer
                  description: Detailed reason code for debugging
                  examples:
                    - 4000403
                  x-parser-schema-id: <anonymous-schema-34>
                error_message:
                  type: string
                  description: Human-readable error description
                  examples:
                    - Audio format not supported
                  x-parser-schema-id: <anonymous-schema-35>
              x-parser-schema-id: <anonymous-schema-31>
          x-parser-schema-id: ErrorPayload
        title: Error
        description: >
          This message reports errors encountered during audio processing or
          streaming. It includes an error code, reason code, and a
          human-readable message. After an error, the session is terminated and
          reconnection is not possible. You need to request a new session.
        example: |-
          {
            "error": {
              "request_type": "source_media_chunk",
              "error_code": 400,
              "reason_code": 4000403,
              "error_message": "Audio format not supported"
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: Error
    bindings: []
    extensions: &ref_2
      - id: x-parser-unique-object-id
        value: voiceStream
  - &ref_3
    id: receiveTranscriptions
    title: Receive transcriptions
    description: Receive transcriptions and translations
    type: receive
    messages:
      - &ref_5
        id: SourceMediaChunk
        contentType: application/json
        payload:
          - name: Source Media Chunk
            description: |2-
               The message contains a chunk of audio data. The audio encoding must be the same that was specified in the [Request Session](/api-reference/voice/request-session) request. 

               When using JSON format, the audio data is base64-encoded. When using MessagePack format, the audio data is raw binary data. 

               The chunk size must not be more than 100 kilobyte or one second in duration. The recommended duration is 50 - 250 milliseconds to achieve the best tradeoff between latency and quality. The interval between chunks must not be less than half of the duration of the preceding chunk and not exceed 30 seconds. Otherwise you will run into rate limits or the session will terminate due to timing out and the connection will be closed. 

               For PCM data the chunk size must be a multiple of the frame size aka encoding unit.
            type: object
            properties:
              - name: source_media_chunk
                type: object
                required: true
                properties:
                  - name: data
                    type: string
                    description: >
                      Audio data in the audio format specified during session
                      initialization. Encoded as base64 string when using JSON.
                      Raw binary data when using MessagePack.
                    required: false
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - source_media_chunk
          properties:
            source_media_chunk:
              type: object
              required:
                - data
              properties:
                data:
                  type: string
                  format: binary
                  description: >
                    Audio data in the audio format specified during session
                    initialization. Encoded as base64 string when using JSON.
                    Raw binary data when using MessagePack.
                  x-parser-schema-id: <anonymous-schema-3>
              x-parser-schema-id: <anonymous-schema-2>
          x-parser-schema-id: SourceMediaChunkPayload
        title: Source Media Chunk
        description: |2-
           The message contains a chunk of audio data. The audio encoding must be the same that was specified in the [Request Session](/api-reference/voice/request-session) request. 

           When using JSON format, the audio data is base64-encoded. When using MessagePack format, the audio data is raw binary data. 

           The chunk size must not be more than 100 kilobyte or one second in duration. The recommended duration is 50 - 250 milliseconds to achieve the best tradeoff between latency and quality. The interval between chunks must not be less than half of the duration of the preceding chunk and not exceed 30 seconds. Otherwise you will run into rate limits or the session will terminate due to timing out and the connection will be closed. 

           For PCM data the chunk size must be a multiple of the frame size aka encoding unit.
        example: |-
          {
            "source_media_chunk": {
              "data": "VGhpcyBpcyBhIGZha2UgYXVkaW8gY2h1bmsK"
            }
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: SourceMediaChunk
      - &ref_6
        id: EndOfSourceMedia
        contentType: application/json
        payload:
          - name: End of Source Media
            description: >
              The message indicates the end of source media data. It causes the
              finalization of tentative transcript segments and triggers the
              emission of final transcript updates, end of transcript messages
              and the end of stream message. No more data chunks can be sent
              afterwards. It marks the end of the stream input.
            type: object
            properties:
              - name: end_of_source_media
                type: object
                description: Empty object signaling end of media stream
                required: true
        headers: []
        jsonPayloadSchema:
          type: object
          required:
            - end_of_source_media
          properties:
            end_of_source_media:
              type: object
              description: Empty object signaling end of media stream
              x-parser-schema-id: <anonymous-schema-4>
          x-parser-schema-id: EndOfSourceMediaPayload
        title: End of Source Media
        description: >
          The message indicates the end of source media data. It causes the
          finalization of tentative transcript segments and triggers the
          emission of final transcript updates, end of transcript messages and
          the end of stream message. No more data chunks can be sent afterwards.
          It marks the end of the stream input.
        example: |-
          {
            "end_of_source_media": {}
          }
        bindings: []
        extensions:
          - id: x-parser-unique-object-id
            value: EndOfSourceMedia
    bindings: []
    extensions: *ref_2
sendOperations:
  - *ref_3
receiveOperations:
  - *ref_4
sendMessages:
  - *ref_5
  - *ref_6
receiveMessages:
  - *ref_7
  - *ref_8
  - *ref_9
  - *ref_10
  - *ref_11
  - *ref_12
  - *ref_13
  - *ref_14
extensions:
  - id: x-parser-unique-object-id
    value: voiceStream
securitySchemes: []

````