Skip to content

Commit 5207fbf

Browse files
committed
RLM Infra
1 parent c2d3adc commit 5207fbf

196 files changed

Lines changed: 27403 additions & 6879 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 12 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -18,56 +18,37 @@ jobs:
1818
steps:
1919
- uses: actions/checkout@v4
2020

21-
- name: Set up Python
22-
uses: actions/setup-python@v5
23-
with:
24-
python-version: "3.12"
25-
2621
- name: Install uv
2722
uses: astral-sh/setup-uv@v4
2823
with:
2924
version: "latest"
3025

3126
- name: Install dependencies
32-
run: |
33-
uv venv
34-
uv pip install -e ".[dev]"
27+
run: uv sync --locked --dev
3528

3629
- name: Run ruff linter
37-
run: |
38-
source .venv/bin/activate
39-
ruff check rlm_code tests
30+
run: uv run ruff check rlm_code tests
4031

4132
- name: Run ruff formatter check
42-
run: |
43-
source .venv/bin/activate
44-
ruff format --check rlm_code tests
33+
run: uv run ruff format --check rlm_code tests
4534

4635
typecheck:
4736
name: Type Check
4837
runs-on: ubuntu-latest
4938
steps:
5039
- uses: actions/checkout@v4
5140

52-
- name: Set up Python
53-
uses: actions/setup-python@v5
54-
with:
55-
python-version: "3.12"
56-
5741
- name: Install uv
5842
uses: astral-sh/setup-uv@v4
5943
with:
6044
version: "latest"
6145

6246
- name: Install dependencies
63-
run: |
64-
uv venv
65-
uv pip install -e ".[dev]"
47+
run: uv sync --locked --dev
6648

6749
- name: Run mypy on core modules
6850
run: |
69-
source .venv/bin/activate
70-
mypy rlm_code/core/config.py rlm_code/core/debug_logger.py rlm_code/mcp/utils.py rlm_code/mcp/retry.py rlm_code/models/cache.py rlm_code/models/streaming.py rlm_code/validation/security.py --ignore-missing-imports
51+
uv run mypy rlm_code/core/config.py rlm_code/core/debug_logger.py rlm_code/mcp/utils.py rlm_code/mcp/retry.py rlm_code/models/cache.py rlm_code/models/streaming.py rlm_code/validation/security.py --ignore-missing-imports
7152
7253
test:
7354
name: Test - Python ${{ matrix.python-version }} on ${{ matrix.os }}
@@ -76,30 +57,21 @@ jobs:
7657
fail-fast: false
7758
matrix:
7859
os: [ubuntu-latest, macos-latest]
79-
python-version: ["3.10", "3.11", "3.12", "3.13"]
60+
python-version: ["3.11", "3.12", "3.13"]
8061

8162
steps:
8263
- uses: actions/checkout@v4
8364

84-
- name: Set up Python ${{ matrix.python-version }}
85-
uses: actions/setup-python@v5
86-
with:
87-
python-version: ${{ matrix.python-version }}
88-
8965
- name: Install uv
9066
uses: astral-sh/setup-uv@v4
9167
with:
9268
version: "latest"
9369

9470
- name: Install dependencies
95-
run: |
96-
uv venv
97-
uv pip install -e ".[test]"
71+
run: uv sync --locked --python ${{ matrix.python-version }} --extra test
9872

9973
- name: Run tests
100-
run: |
101-
source .venv/bin/activate || .venv\Scripts\activate
102-
pytest tests/ -v --cov=rlm_code --cov-report=xml --cov-report=term-missing
74+
run: uv run pytest tests/ -v --cov=rlm_code --cov-report=xml --cov-report=term-missing
10375

10476
- name: Upload coverage
10577
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.12'
@@ -115,25 +87,17 @@ jobs:
11587
steps:
11688
- uses: actions/checkout@v4
11789

118-
- name: Set up Python
119-
uses: actions/setup-python@v5
120-
with:
121-
python-version: "3.12"
122-
12390
- name: Install uv
12491
uses: astral-sh/setup-uv@v4
12592
with:
12693
version: "latest"
12794

12895
- name: Install dependencies
129-
run: |
130-
uv venv
131-
uv pip install -e ".[test]"
96+
run: uv sync --locked --extra test
13297

13398
- name: Run deterministic RLM benchmark gate
13499
run: |
135-
source .venv/bin/activate
136-
python scripts/rlm_bench_gate.py \
100+
uv run python scripts/rlm_bench_gate.py \
137101
--baseline tests/fixtures/rlm_ci_baseline_generic_smoke.json \
138102
--preset generic_smoke \
139103
--limit 2
@@ -145,27 +109,17 @@ jobs:
145109
steps:
146110
- uses: actions/checkout@v4
147111

148-
- name: Set up Python
149-
uses: actions/setup-python@v5
150-
with:
151-
python-version: "3.12"
152-
153112
- name: Install uv
154113
uses: astral-sh/setup-uv@v4
155114
with:
156115
version: "latest"
157116

158-
- name: Install build dependencies
159-
run: |
160-
uv pip install --system build hatchling
161-
162117
- name: Build package
163-
run: python -m build
118+
run: uv build
164119

165120
- name: Check distribution
166121
run: |
167-
uv pip install --system twine
168-
twine check dist/*
122+
uv tool run twine check dist/*
169123
170124
- name: Upload build artifacts
171125
uses: actions/upload-artifact@v4

.github/workflows/deploy-docs.yml

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,23 +24,16 @@ jobs:
2424
with:
2525
fetch-depth: 0
2626

27-
- name: Set up Python
28-
uses: actions/setup-python@v5
29-
with:
30-
python-version: '3.12'
31-
3227
- name: Install uv
3328
uses: astral-sh/setup-uv@v4
3429
with:
3530
version: "latest"
3631

37-
- name: Install MkDocs and dependencies
38-
run: |
39-
uv pip install --system mkdocs-material mkdocs-minify-plugin
32+
- name: Install dependencies
33+
run: uv sync --locked --extra docs
4034

4135
- name: Build and deploy documentation
42-
run: |
43-
mkdocs gh-deploy --force --clean --verbose
36+
run: uv run mkdocs gh-deploy --force --clean --verbose
4437
env:
4538
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
4639

.github/workflows/pre-commit.yml

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,13 @@ jobs:
1212
steps:
1313
- uses: actions/checkout@v4
1414

15-
- name: Set up Python
16-
uses: actions/setup-python@v5
17-
with:
18-
python-version: "3.12"
19-
2015
- name: Install uv
2116
uses: astral-sh/setup-uv@v4
2217
with:
2318
version: "latest"
2419

2520
- name: Install pre-commit
26-
run: uv pip install --system pre-commit
21+
run: uv tool install pre-commit
2722

2823
- name: Cache pre-commit environments
2924
uses: actions/cache@v4

.github/workflows/release.yml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
name: Release
2+
3+
on:
4+
push:
5+
tags:
6+
- "v*"
7+
workflow_dispatch:
8+
9+
jobs:
10+
build:
11+
name: Build Distributions
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: actions/checkout@v4
15+
16+
- name: Install uv
17+
uses: astral-sh/setup-uv@v4
18+
with:
19+
version: "latest"
20+
21+
- name: Build package
22+
run: uv build
23+
24+
- name: Check distributions
25+
run: uv tool run twine check dist/*
26+
27+
- name: Upload distributions
28+
uses: actions/upload-artifact@v4
29+
with:
30+
name: release-dist
31+
path: dist/
32+
retention-days: 7
33+
34+
publish:
35+
name: Publish to PyPI
36+
needs: build
37+
runs-on: ubuntu-latest
38+
permissions:
39+
id-token: write
40+
environment:
41+
name: pypi
42+
url: https://pypi.org/project/rlm-code/
43+
steps:
44+
- name: Download distributions
45+
uses: actions/download-artifact@v4
46+
with:
47+
name: release-dist
48+
path: dist/
49+
50+
- name: Publish
51+
uses: pypa/gh-action-pypi-publish@release/v1

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,8 @@ target/
8585
profile_default/
8686
ipython_config.py
8787

88-
# pyenv
89-
.python-version
88+
# pyenv (kept for uv — do NOT ignore .python-version)
89+
# .python-version
9090

9191
# pipenv
9292
Pipfile.lock

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.11

README.md

Lines changed: 28 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -99,16 +99,18 @@ See all available benchmarks:
9999

100100
### 5. View results
101101

102+
Use the **Research** tab (`Ctrl+5`) for live benchmark and trajectory views.
103+
After at least two benchmark runs, export a compare report:
104+
102105
```
103-
/leaderboard
106+
/rlm bench report candidate=latest baseline=previous format=markdown
104107
```
105108

106-
Shows a table of all your benchmark runs ranked by reward score.
107-
108109
### 6. Replay a session step-by-step
109110

110111
```
111-
/rlm replay
112+
/rlm status
113+
/rlm replay <run_id>
112114
```
113115

114116
Walk through the last run one step at a time — see what code the LLM wrote, what output it got, and what it did next.
@@ -134,10 +136,12 @@ This means the LLM can handle documents much larger than its context window, bec
134136
| `/connect <provider> <model>` | Connect to an LLM |
135137
| `/model` | Interactive model picker |
136138
| `/status` | Show connection status |
139+
| `/sandbox profile secure` | Apply secure sandbox defaults (Docker-first + strict pure RLM) |
137140
| `/rlm run "<task>"` | Run a task through the RLM loop |
138141
| `/rlm bench preset=<name>` | Run a benchmark preset |
139142
| `/rlm bench list` | List available benchmarks |
140-
| `/leaderboard` | View benchmark results |
143+
| `/rlm bench compare` | Compare latest benchmark run with previous run |
144+
| `/harness run "<task>"` | Run tool-using coding harness loop |
141145
| `/rlm replay` | Step through the last run |
142146
| `/rlm chat "<question>"` | Ask the LLM a question about your project |
143147
| `/help` | Show all available commands |
@@ -164,18 +168,26 @@ This means the LLM can handle documents much larger than its context window, bec
164168
Create an `rlm_config.yaml` in your project directory to customize settings:
165169

166170
```yaml
167-
rlm:
168-
paradigm: pure_rlm # pure_rlm, codeact, or traditional
169-
max_steps: 30 # max REPL iterations per run
170-
timeout: 60 # seconds
171+
name: my-project
171172

172-
sandbox:
173-
runtime: local # local, docker, modal, e2b, daytona
173+
models:
174+
openai_api_key: null
175+
openai_model: gpt-5.3-codex
174176

175-
mcp_server:
176-
enabled: false
177-
transport: stdio
178-
port: 8765
177+
default_model: gpt-5.3-codex
178+
179+
sandbox:
180+
runtime: docker
181+
superbox_profile: secure
182+
superbox_auto_fallback: true
183+
superbox_fallback_runtimes: [docker, daytona, e2b]
184+
pure_rlm_backend: docker
185+
pure_rlm_strict: true
186+
pure_rlm_allow_unsafe_exec: false
187+
188+
rlm:
189+
default_benchmark_preset: dspy_quick
190+
benchmark_pack_paths: []
179191
```
180192
181193
Or generate a full sample config:
@@ -202,7 +214,7 @@ rlm_code/
202214
mcp/ # MCP server for tool integration
203215
models/ # LLM provider adapters
204216
sandbox/ # Sandboxed code execution
205-
observability/ # MLflow, OpenTelemetry, LangSmith, LangFuse
217+
harness/ # Tool-using coding harness (/harness)
206218
```
207219

208220
## Documentation

add_numbers.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
def add_numbers(a: float, b: float) -> float:
2+
"""Add two numbers and return the result."""
3+
return a + b
4+
5+
6+
if __name__ == "__main__":
7+
# Example usage
8+
num1 = 5
9+
num2 = 3
10+
result = add_numbers(num1, num2)
11+
print(f"{num1} + {num2} = {result}")

0 commit comments

Comments
 (0)