Skip to content

FEAT: Realtime streaming session support and server-side barge-in attack#1766

Open
adrian-gavrila wants to merge 22 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/realtime-server-vad
Open

FEAT: Realtime streaming session support and server-side barge-in attack#1766
adrian-gavrila wants to merge 22 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/realtime-server-vad

Conversation

@adrian-gavrila
Copy link
Copy Markdown
Contributor

Description

Adds persistent streaming session support to OpenAIRealtimeTarget and introduces BargeInAttack, a streaming attack that leverages server-side VAD to detect and exploit barge-in (interruption) behavior. Previously the target only supported single-turn fire-and-forget audio exchanges; this PR adds the transport primitives needed for multi-turn streaming sessions with incremental audio push, event subscription, and mid-session response requests.

When the server detects new user speech while the assistant is still responding, the in-flight response is automatically interrupted and the conversation history is truncated to match what was actually delivered.

Key additions:

  • OpenAIRealtimeTarget streaming primitivesconnect_async, push_audio_chunk_async, insert_user_audio_async, subscribe_events_async, request_response_async, send_streaming_session_config_async. These expose transport-level operations over a persistent WebSocket connection.
  • _RealtimeEventDispatcher — ABC that owns a realtime connection's event stream, routes provider-specific events to the active turn, and fires an on_user_audio_committed callback when server VAD finalizes a turn. Provider-specific routing is isolated to _route_event / _cancel abstract methods.
  • BargeInAttack — streaming attack that pushes audio chunks into a persistent session, applies configured converters on each server-committed turn (convert-on-commit), requests responses, and tracks interruptions. Per-turn Message pairs are persisted to CentralMemory with prompt_metadata["interrupted"] = True on interrupted turns.
  • ServerVadConfig / RealtimeTargetResult — shared types for configuring server VAD and representing turn results (audio, transcripts, interruption flag).
  • PromptNormalizer.convert_audio_async — applies audio converter configurations to raw PCM bytes for streaming attacks that hold audio mid-turn rather than a Message.

The target exposes only transport primitives; all attack logic (buffering, convert-on-commit dance, interruption signaling) lives in BargeInAttack.

Tests and Documentation

  • 82 unit tests across 3 test files covering: event dispatch and routing, turn lifecycle, interruption detection, converter application, error paths, multi-turn connection reuse, and the full attack lifecycle.
  • Coverage: 98% on realtime_audio.py, 72% on openai_realtime_target.py (uncovered lines are pre-existing code paths, not new additions).
  • Notebook: doc/code/executor/attack/barge_in_attack.py (jupytext py:percent format) demonstrates the attack against a live OpenAI Realtime API endpoint with server VAD. Ran successfully against gpt-4o-realtime-preview — outputs cleared for CI (requires live credentials).

adrian-gavrila and others added 16 commits May 14, 2026 13:07
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ardown

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nc rename, Optional→union

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…me-server-vad

# Conflicts:
#	pyrit/prompt_target/openai/openai_realtime_target.py
@hannahwestra25 hannahwestra25 self-assigned this May 21, 2026
Comment thread pyrit/executor/attack/streaming/barge_in.py
Comment thread pyrit/executor/attack/streaming/barge_in.py Outdated
piece.converted_value = converted_text
piece.converted_value_data_type = converted_text_data_type

async def convert_audio_async(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make this a message_normalizer and then the target can use this as the default normalizer if the user doesn't specify a special one ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this probably shouldn't live on PromptNormalizer but I am not sure it fits the shape of MessageNormalizer since this is converting from bytes to bytes instead of operating directly on messages (since they don't exist mid-stream where this would be run). I made a standalone AudioStreamNormalizer instead of forcing the shape. What do you think?

Comment thread pyrit/executor/attack/streaming/barge_in.py
Comment thread doc/code/executor/attack/barge_in_attack.py Outdated
Comment thread pyrit/executor/attack/streaming/barge_in.py Outdated
Comment thread doc/code/executor/attack/barge_in_attack.py
adrian-gavrila and others added 6 commits May 22, 2026 12:36
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…imitive

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… inline drive_response

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants