This project demonstrates an advanced implementation of a Speech-to-Speech (S2S) application using the Amazon Nova Sonic model on Amazon Bedrock. It has evolved from a simple S2S demo to showcase a modular architecture with support for both synchronous and asynchronous tool use, real-time bidirectional streaming, and an interactive chat interface built with vanilla JavaScript and Vite (frontend) and a Python backend. This project can serve as a starting point for complex conversational AI projects.
- Real-time Speech-to-Speech: Leverages Amazon Nova Sonic for low-latency voice conversations.
- Bidirectional Streaming: Implements Amazon Bedrock's bidirectional streaming API for concurrent audio input/output and event handling.
- Modular Tool Framework:
- Easily extendable backend framework for adding new tools.
- Tools are defined with specifications (name, description, input schema) for Amazon Nova Sonic.
- Synchronous Tool Support:
- Nova Sonic waits for tool execution to finish. Best for tools responding in less than a minute.
- Examples:
getWeatherDynamically fetches weather using the Strands Agents only for U.S. cities, an open-source agents frameworknumberRaceA simple tool that takes a number provided and sleeps for those number of seconds!
- Asynchronous Tool Support (demonstrated with
agentSearch):- Immediate Placeholder Response: For long-running tools, an immediate placeholder message is sent to Nova Sonic to avoid timeouts and inform the user that the task has started.
- Background Task Execution: The actual tool logic (e.g., A2A client call for
agentSearch) runs as a background task in the Python backend. - Out-of-Band UI Notification: Once the background tool completes, a custom WebSocket message is sent to the frontend to notify the user in the chat interface (e.g., "Agent Search for 'Diwali' is complete.").
- Result Caching & Retrieval: Completed asynchronous tool results are cached in the backend. When the user subsequently asks for these results (and Nova Sonic re-invokes the tool, potentially guided by an updated system prompt), the cached data is retrieved and provided.
- Frontend Interface:
- Built with HTML, CSS, and vanilla JavaScript (using Vite for development).
- Real-time audio capture and streaming to the backend.
- Real-time playback of Nova Sonic's audio responses.
- Chat history display, including user speech, Nova Sonic responses, and tool completion notifications.
- Voice selection for Nova Sonic's output.
- Agent2Agent A2A Protocol Integration:
- The
agentSearchtool demonstrates integration with an external agent using an A2A protocol via a Python client. The necessary client-sidecommonA2A libraries are included in the project.
- The
- Chrome Extension Capabilities:
- Side panel UI for interacting with Nova Sonic alongside web Browse.
imageAnalyzertool captures screenshots of the active tab for analysis.
The application consists of:
- A Frontend User Interface, which can be run as a standalone web application (developed with Vite) or as a Chrome Extension (side panel). It handles user interaction, audio capture/playback, and WebSocket communication.
- A Python Backend Server, which manages WebSocket connections from the frontend, maintains the bidirectional stream with Amazon Nova Sonic on Bedrock, and orchestrates tool execution (both synchronous and asynchronous).
- An A2A Server for the
agentSearchtool, runnable frombackend/agents/bedrock/. Needs to be used when tryingagentSearch
├── arch.jpg
├── arch.png
├── backend
│ ├── __init__.py
│ ├── agents
│ │ └── bedrock
│ │ ├── __main__.py
│ │ ├── agent.py
│ │ ├── pyproject.toml
│ │ ├── README.md
│ │ ├── task_manager.py
│ │ └── utility.py
│ ├── common
│ │ ├── __init__.py
│ │ ├── client
│ │ │ ├── __init__.py
│ │ │ ├── card_resolver.py
│ │ │ └── client.py
│ │ ├── server
│ │ │ ├── __init__.py
│ │ │ ├── server.py
│ │ │ ├── task_manager.py
│ │ │ └── utils.py
│ │ ├── types.py
│ │ └── utils
│ │ ├── in_memory_cache.py
│ │ └── push_notification_auth.py
│ ├── lib
│ │ ├── __init__.py
│ │ ├── agent_search
│ │ │ ├── __init__.py
│ │ │ ├── agent_search_tool.py
│ │ │ └── agentclient.py
│ │ ├── image_analyzer
│ │ │ ├── __init__.py
│ │ │ ├── image_analyzer_llm_client.py
│ │ │ └── image_analyzer_tool.py
│ │ ├── number_race_tool.py
│ │ └── weather_tool.py
│ ├── nova_s2s_backend.py
│ ├── pyproject.toml
│ ├── README.md
│ └── uv.lock
├── chromeextension
│ ├── background.js
│ ├── icons
│ │ ├── icon128.png
│ │ ├── icon16.png
│ │ └── icon48.png
│ ├── index.html
│ ├── js
│ │ ├── lib
│ │ │ ├── play
│ │ │ │ ├── AudioPlayer_extension.js
│ │ │ │ └── AudioPlayerProcessor_extension.worklet.js
│ │ │ └── util
│ │ │ └── ChatHistoryManager_extension.js
│ │ ├── main_extension.js
│ │ ├── toolConfig_extension.js
│ │ └── websocketEvents_extension.js
│ ├── manifest.json
│ └── style.css
├── frontend
│ ├── index.html
│ ├── package-lock.json
│ ├── package.json
│ ├── public
│ │ └── vite.svg
│ └── src
│ ├── javascript.svg
│ ├── lib
│ │ ├── play
│ │ │ ├── AudioPlayer.js
│ │ │ └── AudioPlayerProcessor.worklet.js
│ │ └── util
│ │ ├── ChatHistoryManager.js
│ │ └── ObjectsExt.js
│ ├── main.js
│ ├── style.css
│ ├── toolConfig.js
│ └── websocketEvents.js
├── LICENSE.txt
└── README.md
- uv (or
pip) and Python >= 3.12 - Node.js and npm
- An AWS account with access to models on Amazon Bedrock - Amazon Nova Sonic, Amazon Nova Micro, Amazon Nova Lite
- AWS Credentials configured in your environment (e.g., via
~/.aws/credentialsor environment variablesAWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN(if applicable), andAWS_REGION). - (For
agentSearchtool) A running A2A-compatible server. An example server can be run frombackend/agents/bedrock/. - (For the example A2A Server in
backend/agents/bedrock/) A Tavily API Key (TAVILY_API_KEYenvironment variable). - (For
getWeathertool) Ensure any dependencies and API key requirements for the Strands Agent Framework are met.
-
Starting the A2A Server (for
agentSearchtool):- Open a new terminal.
- Navigate to the
backend/agents/bedrock/folder. - Export your Tavily API key:
export TAVILY_API_KEY="YOUR_TAVILY_API_KEY"
- Start the A2A server (runnable with
uv run .from that directory):uv run . - This server usually runs on
http://localhost:10000. Keep this terminal running.
-
Starting the Backend Application:
- Open another new terminal.
- Navigate to the
backendfolder. - Ensure your AWS credentials and region are configured (e.g., via environment variables or shared credentials file). Example for environment variables:
export AWS_ACCESS_KEY_ID="YOUR_AWS_KEY_ID" export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_KEY" export AWS_REGION="your-aws-region" # e.g., us-east-1 # export AWS_SESSION_TOKEN="YOUR_SESSION_TOKEN" # If using temporary credentials
- Install dependencies (if you haven't already or if
pyproject.tomlchanged):uv sync
- Start the backend:
uv run nova_s2s_backend.py
- The backend application must remain running. It typically serves on
ws://localhost:8081.
-
Starting the Frontend Application:
- Open a third new terminal.
- Navigate to the
frontendfolder. - Install dependencies (if you haven't already):
npm install
- Start the application:
npm run dev
- Access the frontend by opening the URL shown (usually
http://localhost:5173/) in your browser.
Getting Started:
- Ensure all three components (A2A Server (if using
agentSearch), Backend Application, Frontend Application) are running. - Open the frontend URL in your browser.
- Click "Start Streaming" to begin the session with Nova Sonic.
- Speak into your microphone to interact.
- You can select an alternative voice for Nova Sonic from the 'Voice' dropdown.
- Try invoking tools:
- "What's the weather in New York?"
- "Start a number race for 3 seconds."
- "Agent search for Diwali." (Wait for the placeholder, then the UI notification "Search complete...")
- Then ask: "What were the agent search results for Diwali?" (to test cached result retrieval).
- Ensure the Backend Application (and A2A Server if testing
agentSearch) is running. - Click the "Nova Sonic S2S Extension" icon in your Chrome toolbar. This will open the application UI in the browser's side panel.
- In the side panel, click "Start" to begin the session.
- Grant microphone permission when prompted by the browser (for the extension).
- Speak into your microphone to interact.
- Select different voices for Nova Sonic using the dropdown.
- Try invoking tools:
- "What's the weather in New York?"
- "Agent search for Diwali." (Wait for the placeholder, then the UI notification "Search complete...")
- Then ask: "What were the agent search results for Diwali?"
- Navigate to a webpage with an interesting image, then say: "Analyze this page image" or "Image analyzer, describe what you see." (Wait for placeholder, then UI notification...)
- Then ask: "What did the image analysis find?"
This application is designed for easy extension with new tools.
- Backend:
- Create
backend/lib/my_new_sync_tool.py. - Implement
async def handle_my_new_sync_tool(manager_instance, tool_use_content: dict) -> dict:(returns{"result": "...", "status": "success"}). Themanager_instanceargument is passed to all handlers for consistency but may not be used by simple synchronous tools. - Implement
def get_my_new_sync_tool_spec() -> dict:(returns Nova Sonic tool spec). - In
backend/nova_s2s_backend.py: Import handler/spec fromlib.my_new_sync_tooland register inBedrockStreamManager.tool_handlersandtool_specs_definitions.
- Create
- Frontend:
- In
frontend/src/toolConfig.js, add the tool's specification togetToolSpecifications().
- In
- Backend:
- Create a sub-package in
backend/lib/(e.g.,my_async_tool/with__init__.py). - Place any specific client logic (e.g.,
my_async_client.py) and its dependencies inside this sub-package. - Create
backend/lib/my_async_tool/my_async_tool_handler.py(or similar).- Implement any client initialization/retrieval logic within this module (e.g.,
async def get_initialized_my_client()). - Implement
async def _execute_actual_async_work(params...) -> dict:(returns data for cache). - Implement the main
async def handle_my_async_tool(manager_instance, tool_use_content: dict) -> dict:. This handler will:- Check
manager_instance.completed_async_tool_resultsfor cached data (keyed bytool_name.lower()). - If no cache, check
manager_instance.active_background_tasks(keyed bytool_use_id) for an existing task for this specific invocation. - If new work is needed, define an
actual_op_coro_factory = lambda: _execute_actual_async_work(...). - Call
await manager_instance.launch_background_tool_task(tool_use_id, tool_name, actual_op_coro_factory). - Return an immediate placeholder
{"result": "Working on it...", "status": "success"}.
- Check
- Implement any client initialization/retrieval logic within this module (e.g.,
- In
BedrockStreamManager(backend/nova_s2s_backend.py):- The existing
launch_background_tool_taskmethod is generic and will handle running the coroutine, caching results (incompleted_async_tool_results), and queueing theCustomToolNotification. - Import and register your new async tool handler and spec (e.g., from
lib.my_async_tool.my_async_tool_handler).
- The existing
- Create a sub-package in
- Frontend:
- Add the tool's spec to
frontend/src/toolConfig.js. - The existing
handleToolCompletionNotificationinwebsocketEvents.jswill display the UI notification for events withcustomEvent: "toolCompletionNotification".
- Add the tool's spec to
For detailed examples, refer to weather_tool.py (synchronous) and the agent_search/ package (asynchronous) in backend/lib/.
The agentSearch tool uses an A2A client defined in backend/lib/agent_search/agentclient.py, which in turn uses shared A2A common libraries located in backend/common/.
May 2025
- This code has been tested with the
amazon.nova-sonic-v1:0version of the Amazon Nova Sonic model (the primary version available during testing). For the best experience, try with a wired headset and microphone combo. - The
backendhas all the tools implemented:numberRace,agentSearch,getWeather, andimageAnalyzer. - Chrome Extension exposes
agentSearch,getWeather, andimageAnalyzer. - Web App exposes numberRace
,agentSearch, andgetWeather`
This project is licensed under the Apache License 2.0, and is open to contributions from the community..
