agent0lab
diff --git a/‎.gitattributes‎
Lines changed: 2 additions & 0 deletions b/‎.gitattributes‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎.github/workflows/test-and-publish.yml‎
Lines changed: 88 additions & 0 deletions b/‎.github/workflows/test-and-publish.yml‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 26 additions & 0 deletions b/‎.gitignore‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 17 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 147 additions & 0 deletions b/‎README.md‎
Lines changed: 147 additions & 0 deletions
@@ -0,0 +1,2 @@
+# Auto detect text files and perform LF normalization
+* text=auto
@@ -0,0 +1,88 @@
+name: Test and Publish Agent
+
+on:
+  pull_request:
+  push:
+    branches:
+      - main
+    tags:
+      - 'v*'  # Trigger on version tags like v1.0.0, v1.1.0
+
+jobs:
+  test-and-publish:
+    runs-on: ubuntu-latest
+
+    # These permissions are required for the workflow to:
+    # - Read repository contents (checkout code)
+    # - Write to GitHub Container Registry (push Docker images)
+    permissions:
+      contents: read
+      packages: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Extract metadata for Docker
+        id: meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ghcr.io/${{ github.repository }}
+          tags: |
+            type=ref,event=pr
+            type=semver,pattern={{version}}
+            type=semver,pattern={{major}}
+            type=raw,value=latest,enable={{is_default_branch}}
+
+      - name: Build Docker image
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          push: false
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          load: true
+          platforms: linux/amd64
+
+      - name: Start agent container
+        env:
+            SECRETS_JSON: ${{ toJson(secrets) }}
+        run: |
+          echo "$SECRETS_JSON" | jq -r 'to_entries[] | "\(.key)=\(.value)"' > .env
+          docker run -d -p 9009:9009 --name agent-container --env-file .env $(echo "${{ steps.meta.outputs.tags }}" | head -n1) --host 0.0.0.0 --port 9009
+          timeout 30 bash -c 'until curl -sf http://localhost:9009/.well-known/agent-card.json > /dev/null; do sleep 1; done'
+
+      - name: Set up uv
+        uses: astral-sh/setup-uv@v4
+
+      - name: Install test dependencies
+        run: uv sync --extra test
+
+      - name: Run tests
+        run: uv run pytest -v --agent-url http://localhost:9009
+
+      - name: Stop container and show logs
+        if: always()
+        run: |
+          echo "=== Agent Container Logs ==="
+          docker logs agent-container || true
+          docker stop agent-container || true
+
+      - name: Log in to GitHub Container Registry
+        if: success() && github.event_name != 'pull_request'
+        uses: docker/login-action@v3
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Push Docker image
+        if: success() && github.event_name != 'pull_request'
+        run: docker push --all-tags ghcr.io/${GITHUB_REPOSITORY,,}
+
+      - name: Output image digest
+        if: success() && github.event_name != 'pull_request'
+        run: |
+          echo "## Docker Image Published :rocket:" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "**Tags:** ${{ steps.meta.outputs.tags }}" >> $GITHUB_STEP_SUMMARY
@@ -0,0 +1,26 @@
+.DS_Store
+.env
+.python-version
+
+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+
+# Virtual environments
+.venv
+
+# Test / tooling caches & reports
+.pytest_cache/
+.ruff_cache/
+.mypy_cache/
+.tox/
+.nox/
+
+# Coverage outputs
+.coverage
+coverage.xml
+htmlcov/
@@ -0,0 +1,17 @@
+FROM ghcr.io/astral-sh/uv:python3.13-bookworm
+
+RUN adduser agent
+USER agent
+WORKDIR /home/agent
+
+COPY pyproject.toml uv.lock README.md ./
+COPY src src
+COPY assets assets
+
+RUN \
+    --mount=type=cache,target=/home/agent/.cache/uv,uid=1000 \
+    uv sync --locked
+
+ENTRYPOINT ["uv", "run", "src/server.py"]
+CMD ["--host", "0.0.0.0"]
+EXPOSE 9009
@@ -0,0 +1,147 @@
+# Protocol Agent Benchmark
+
+## Intro
+
+Many cryptographic primitives could reshape human-to-human communications in our business and personal lives, but this doesn’t happen because cryptography is complex and the math is hard to “do in your head.”
+
+Agents could learn that. They can map mundane intents to primitives and instantiate protocols at runtime. Scheduling a meeting can use Private Set Intersection (PSI) instead of sharing calendars. “Prove you’re over 21” at the bar can use a Zero-Knowledge Proof (ZKP) (with nonce/challenge anti-replay) instead of photocopying an ID. Anonymous reporting with verifiable membership can use anonymous credentials / ring signatures / group signatures to solve spam. Tip tokens can use blind signatures so the issuer can’t link purchase to spend while still preventing double spending. The list is actually pretty long.
+
+Achieving this is a multidimensional challenge. Agents must: (1) "spot" and select the right primitive in an everyday context, (2) negotiate adoption with another agent, (3) implement the protocol correctly, (4) use crypto tools and computation competently, and (5) reason about threats and security strength. These are exactly the five judging dimensions of Protocol Agent, a benchmark that measures not “crypto knowledge” in the abstract (which has already been studied), but the practical ability to apply cryptography to improve daily life.
+
+This benchmark is the first step in a larger effort (more coming in Q1 2026): post-training models that perform better on it.
+
+## Challenges
+
+- Human-readable: [Read here](assets/benchmark_challenges_diverse_v1.md)
+
+## Leaderboard
+
+[See here](https://github.com/MarcoMetaMask/protocol-agent-leaderboard)
+
+## About this repo
+
+An [A2A (Agent-to-Agent)](https://a2a-protocol.org/latest/) **green agent** compatible with the [AgentBeats](https://agentbeats.dev) platform.
+
+Protocol Agent benchmarks a **single purple agent** via **self-play** on the crypto conversational challenges from `benchmark_challenges_diverse_v1.json`, scoring with the same rubric dimensions as the arena:
+
+- Primitive Selection
+- Negotiation Skills
+- Implementation Correctness
+- Computation / Tool Usage
+- Security Strength
+
+This repo is **standalone** for local demo runs: it includes a local baseline purple agent (`baseline_purple/`) and a one-command runner that streams the multi-role conversation as it runs.
+
+## Project Structure
+
+```
+src/
+├─ server.py      # Server setup and agent card configuration
+├─ executor.py    # A2A request handling
+├─ agent.py       # Protocol Agent implementation (entrypoint)
+├─ benchmark_schema.py  # Benchmark JSON loader + datamodel
+├─ runner.py           # Self-play match runner
+├─ judge_openai.py      # OpenAI judge wrapper
+├─ scoring.py           # Outcome + aggregation (arena-aligned)
+└─ messenger.py         # A2A messaging utilities
+baseline_purple/
+├─ src/                 # Local baseline purple agent (A2A server)
+└─ requirements.txt
+scripts/
+├─ run_local.sh         # One-command local end-to-end runner
+└─ run_client.py        # Local streaming client (prints turns + result artifact)
+tests/
+└─ test_agent.py  # Agent tests
+Dockerfile        # Docker configuration
+pyproject.toml    # Python dependencies
+.github/
+└─ workflows/
+   └─ test-and-publish.yml # CI workflow
+```
+
+## Quickstart (end-to-end, no manual intervention)
+
+1) Set env vars:
+
+```bash
+export OPENAI_API_KEY="...your key..."
+export OPENAI_MODEL_JUDGE="gpt-4.1-mini"
+export OPENAI_MODEL_PARTICIPANT="gpt-4.1-mini"
+```
+
+2) Run:
+
+```bash
+./scripts/run_local.sh
+```
+
+You should see streamed lines like:
+
+- `turn 1 | Alice: ...`
+- `turn 2 | Bob: ...`
+
+and then a final `Result` artifact (JSON + summary).
+
+## Running Locally
+
+```bash
+python3 src/server.py --host 127.0.0.1 --port 9009
+```
+
+## Example EvalRequest
+
+```json
+{
+  "participants": { "agent": "http://localhost:9019" },
+  "config": {
+    "benchmark_path": "assets/benchmark_challenges_diverse_v1.json",
+    "limit_challenges": 1,
+    "max_turns": 4,
+    "repetitions": 1,
+    "seed": 0,
+    "include_transcripts": false,
+    "timeout_s_per_turn": 300
+  }
+}
+```
+
+## Environment Variables
+
+- `OPENAI_API_KEY`: required for judging.
+- `OPENAI_MODEL_JUDGE`: e.g. `gpt-4.1-mini`.
+- `OPENAI_BASE_URL` (optional): defaults to `https://api.openai.com/v1/responses`.
+
+## Running with Docker
+
+Build:
+
+```bash
+docker build --platform linux/amd64 -t protocol-agent:local .
+```
+
+Run:
+
+```bash
+docker run -p 9009:9009 protocol-agent:local
+```
+
+## Publishing
+
+The repository includes a GitHub Actions workflow that automatically builds, tests, and publishes a Docker image of your agent to GitHub Container Registry.
+
+If your agent needs API keys or other secrets, add them in Settings → Secrets and variables → Actions → Repository secrets. They'll be available as environment variables during CI tests.
+
+- **Push to `main`** → publishes `latest` tag:
+```
+ghcr.io/<your-username>/<your-repo-name>:latest
+```
+
+- **Create a git tag** (e.g. `git tag v1.0.0 && git push origin v1.0.0`) → publishes version tags:
+```
+ghcr.io/<your-username>/<your-repo-name>:1.0.0
+ghcr.io/<your-username>/<your-repo-name>:1
+```
+
+Once the workflow completes, find your Docker image in the Packages section (right sidebar of your repository). Configure the package visibility in package settings.
+
+> **Note:** Organization repositories may need package write permissions enabled manually (Settings → Actions → General). Version tags must follow [semantic versioning](https://semver.org/) (e.g., `v1.0.0`).
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# Auto detect text files and perform LF normalization`
	`2`	`+* text=auto`