BaranziniLab
diff --git a/‎documentation/architecture.md‎
Lines changed: 115 additions & 0 deletions b/‎documentation/architecture.md‎
Lines changed: 115 additions & 0 deletions
diff --git a/‎documentation/data-privacy.md‎
Lines changed: 99 additions & 0 deletions b/‎documentation/data-privacy.md‎
Lines changed: 99 additions & 0 deletions
@@ -0,0 +1,115 @@
+# UCSF BioRouter — Architecture
+
+UCSF BioRouter is an AI-powered integrated research environment that unifies commercial, institution-hosted, and local large language models (LLMs), AI agents, Information Commons databases, and customizable workflows into one extensible platform for explorative analysis, prototyping, automation, and federated cross-institution collaboration.
+
+**Developed by:** Wanjun Gu (wanjun.gu@ucsf.edu), Baranzini Lab (https://baranzinilab.ucsf.edu/), UCSF
+**Supported by:** UCSF IT and Information Commons
+**GitHub:** https://github.com/BaranziniLab/BioRouter
+**Releases:** https://github.com/BaranziniLab/BioRouter/releases
+
+---
+
+## High-Level Overview
+
+BioRouter is built as a modular, plugin-based system. It consists of three main layers:
+
+1. **Interface** — The desktop GUI or CLI that accepts user input and displays responses.
+2. **Agent** — The core reasoning loop that manages LLM interaction, tool execution, and session state.
+3. **Extensions** — Pluggable MCP servers that give the agent access to tools (file operations, database queries, web access, code execution, etc.).
+
+In a typical session, the interface starts an agent instance, which connects to one or more extensions simultaneously and routes requests through the selected LLM provider.
+
+---
+
+## Tech Stack
+
+### Backend — Rust
+
+The backend is a Rust workspace (`crates/`) organized into several crates:
+
+| Crate | Purpose |
+|---|---|
+| `biorouter` | Core agent library — agent loop, provider integrations, session management, recipes, scheduling |
+| `biorouter-server` | REST API server (`biorouterd`) that the desktop UI communicates with |
+| `biorouter-cli` | Command-line interface (`biorouter` binary) |
+| `biorouter-mcp` | Built-in MCP servers (Developer, Computer Controller, Memory, Tutorial, Auto Visualiser) |
+| `biorouter-acp` | Agent Communication Protocol support |
+| `biorouter-bench` | Benchmarking tools |
+| `biorouter-test` | Integration tests |
+
+Key Rust dependencies:
+
+- **tokio** — Async runtime
+- **axum** — HTTP web framework for the API server
+- **rmcp** — Model Context Protocol implementation
+- **reqwest** — HTTP client for provider API calls
+- **serde / serde_json** — Serialization
+- **tiktoken-rs** — Token counting for context management
+- **minijinja** — Jinja-style template engine for recipes
+- **tokio-cron-scheduler** — Cron-based job scheduling
+- **sqlx (SQLite)** — Persistent session and schedule storage
+- **etcetera** — Cross-platform config path resolution (`~/.config/biorouter/` on macOS/Linux)
+
+### Frontend — Electron + React
+
+The desktop application is an Electron app built with React and TypeScript.
+
+| Component | Details |
+|---|---|
+| Framework | Electron 39 + React 19 |
+| Build tool | Vite + Electron Forge |
+| Language | TypeScript (strict mode) |
+| Styling | TailwindCSS v4 with custom design tokens |
+| UI components | Radix UI primitives |
+| Routing | React Router DOM v7 |
+| Testing | Vitest (unit), Playwright (E2E) |
+
+The frontend communicates with the `biorouterd` REST server (started in the background by the Electron main process) via a local HTTP API. The OpenAPI spec is generated from the Rust server and used to type-safe frontend API calls.
+
+---
+
+## Agent Interaction Loop
+
+The agent operates in a continuous loop:
+
+1. **Human request** — The user sends a message or task through the interface.
+2. **Provider chat** — The agent forwards the request plus a list of available tools to the configured LLM provider.
+3. **Tool call** — If the LLM decides to invoke a tool, the agent extracts the tool call (JSON) and executes it via the appropriate extension.
+4. **Result feedback** — The tool result is returned to the LLM as context.
+5. **Context revision** — Old or irrelevant messages are summarized or pruned to manage token usage efficiently.
+6. **Final response** — Once all tool calls are complete, the LLM sends a final response to the user.
+
+If a tool call produces an error (invalid JSON, missing tool, etc.), BioRouter captures and returns the error to the model as a tool response, allowing the LLM to self-correct without breaking the session.
+
+---
+
+## Configuration and Data Paths
+
+| Location | Purpose |
+|---|---|
+| `~/.config/biorouter/config.yaml` | Primary config — providers, API keys, extensions, settings |
+| `~/.config/biorouter/sessions/` | Session history (SQLite) |
+| `~/.config/biorouter/recipes/` | Saved recipes |
+| `~/.config/biorouter/skills/` | BioRouter-specific global skills |
+| `~/Library/Application Support/BioRouter/` | Electron app state (macOS) |
+
+The config file is shared between the Desktop UI and the CLI — changes in either interface are reflected in both.
+
+---
+
+## Multi-Model and Multi-Agent Support
+
+BioRouter supports running multiple agents in parallel:
+
+- **Sub-agents** — A recipe can spawn sub-agents to handle parallel tasks, each with its own LLM provider and extension set.
+- **Lead/Worker orchestration** — A lead model delegates sub-tasks to worker models, enabling multi-model pipelines.
+- **Subrecipes** — Recipes can call other recipes as sub-tasks, running them sequentially or in parallel.
+
+---
+
+## Security
+
+- Extensions are scanned for known malware before activation.
+- BioRouter enforces permission modes that control whether tool calls require user approval.
+- `.biorouterignore` files can restrict which files and directories the agent is allowed to access.
+- Allowlists can restrict which shell commands the agent may execute.
@@ -0,0 +1,99 @@
+# UCSF BioRouter — Data Privacy and Patient Data Guidelines
+
+This document outlines the data privacy considerations for using UCSF BioRouter, with specific guidance on handling patient data, clinical information, and other sensitive research data.
+
+---
+
+## Overview
+
+BioRouter routes your inputs and conversation context to an LLM provider for processing. The data privacy properties of any given session depend entirely on **which provider you are using**. Different providers have fundamentally different data handling policies:
+
+- **Commercial cloud APIs** (Anthropic, OpenAI, Google, etc.) — data is processed on the provider's cloud infrastructure. Review the provider's privacy policy and data processing terms before use.
+- **Institution-managed cloud services** (UCSF Azure OpenAI, UCSF Amazon Bedrock) — data is processed within infrastructure governed by UCSF's institutional agreements. These may offer stronger privacy protections than personal API accounts.
+- **Local models** (Ollama) — data is processed entirely on your own device. Nothing is transmitted to any external service.
+
+---
+
+## Patient Data and Sensitive Research Data
+
+**IMPORTANT NOTICE:**
+
+If you need to work with patient data, protected health information (PHI), clinical records, genomic data linked to individuals, or any data subject to HIPAA, institutional data governance policies, or other regulatory requirements:
+
+- **Use only institution-managed services or fully local models.**
+- Do NOT use personal commercial API accounts (e.g., your personal Anthropic API key, personal OpenAI account) with patient or sensitive data.
+- The safest option for data that must remain completely private is a **local model via Ollama** — data never leaves your device.
+
+### Recommended Providers for Sensitive Data
+
+| Provider | Data Stays Within | Recommended For |
+|---|---|---|
+| **Ollama (local)** | Your device only — no external transmission | Highest sensitivity data, air-gapped requirements |
+| **UCSF Azure OpenAI** | UCSF's institutional Azure tenant | Institution-approved use cases — verify with your institution |
+| **UCSF Amazon Bedrock** | UCSF's institutional AWS environment | Institution-approved use cases — verify with your institution |
+
+### Providers NOT Recommended for Patient Data
+
+The following providers use personal/commercial API accounts and are generally **not appropriate** for patient data without explicit institutional authorization:
+
+- Anthropic (direct API)
+- OpenAI (direct API)
+- Google Gemini (direct API)
+- OpenRouter
+- Venice AI
+- X.AI (Grok)
+- Any other third-party commercial API
+
+---
+
+## Verification Requirement
+
+**Always verify with your institution before working with sensitive data.**
+
+Even institution-managed services (UCSF Azure OpenAI, UCSF Amazon Bedrock) may have specific terms of use, approved use cases, and restrictions that change over time. Before using BioRouter with any sensitive data:
+
+1. Confirm that your intended use case is covered by the institutional data use agreement for that provider.
+2. Check with UCSF IT or your IRB/compliance office if you are unsure.
+3. Ensure that the data classification level of your data is compatible with the service tier you are using.
+
+UCSF policies around data handling, HIPAA compliance, and acceptable use of cloud services evolve. The BioRouter development team cannot advise on the current status of institutional agreements. Always check directly with UCSF compliance and IT.
+
+---
+
+## Best Practices for Data Handling
+
+**De-identify before using BioRouter:**
+- Remove names, dates of birth, medical record numbers, addresses, and other direct identifiers before inputting clinical data into any BioRouter session, unless you have explicit authorization and a compliant data pathway to do so with identifiers present.
+
+**Minimize data exposure:**
+- Provide only the data necessary for the task. Avoid pasting entire datasets into the chat when a representative sample or summary would suffice.
+
+**Use local models when possible:**
+- For exploratory work, algorithm development, or testing with real data, Ollama with a capable local model is the safest option.
+
+**Review session logs:**
+- BioRouter logs sessions locally. Be aware that session history stored in `~/.config/biorouter/` on your device may contain data you entered. Protect access to your device accordingly.
+
+**Do not share sessions containing sensitive data:**
+- BioRouter supports sharing sessions and recipes. Do not share sessions that contain patient data or other sensitive information.
+
+---
+
+## Summary
+
+| Data Type | Recommended Approach |
+|---|---|
+| De-identified research data | Institution-managed providers or local Ollama |
+| Patient data / PHI | Local Ollama only, or institution-managed with explicit compliance approval |
+| Public / non-sensitive data | Any provider |
+| Proprietary unpublished research data | Local Ollama or institution-managed — verify confidentiality requirements |
+
+**When in doubt: use Ollama (local) or check with your institution.**
+
+---
+
+## Contact
+
+UCSF BioRouter is developed by Wanjun Gu (wanjun.gu@ucsf.edu) at the Baranzini Lab (https://baranzinilab.ucsf.edu/) at UCSF, with support from UCSF IT and Information Commons.
+
+For questions about data governance, HIPAA compliance, and approved data use pathways, contact UCSF IT Security or your departmental compliance officer.