AGENTS.md

This file gives coding agents the real commands, architecture boundaries, and conventions for this repository. It is based on the checked-in files and the current implementation, not old assumptions.

1) Repo summary

Project type: Python desktop automation app with Tk / ttkbootstrap UI, screenshot-driven visual prompting, local verification, and multiple LLM providers.
Main entrypoint: app/app.py
Build script: build.py
Primary source root: app/
Test directory: tests/
Local Python version: .python-version -> 3.12.8
CI lint workflow: .github/workflows/pylint.yml
Current architecture style: single-step visual agent loop with Prompt System v1

2) What the app does today

Accepts a natural-language user goal in the desktop UI.
Captures a screenshot with visible rulers / grid.
Builds a structured prompt for the selected model provider.
Receives a single next-step JSON action from the model.
Executes the step locally through the interpreter.
Optionally performs local post-action verification using before/after screenshots.
Repeats until the model returns done, the request is interrupted, or the runtime stops on failure.

The current system is not a multi-step batch planner. It is a strict single-step closed loop.

3) Extra agent rule files found during scan

AGENTS.md at repo root: present (this file)
.cursorrules: not found
.cursor/rules/: not found
.github/copilot-instructions.md: not found
Do not invent extra Cursor or Copilot rules for this repository.

4) Environment setup

From repository root:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Optional lint dependency:

python -m pip install pylint

Windows activation:

.venv\Scripts\activate

5) Run / build / lint / test commands

Run the app:

python app/app.py

Build package:

python build.py

Lint all tracked Python files:

pylint $(git ls-files '*.py')

Lint a single file:

pylint app/core.py

Safe local regression checks commonly used in this repo:

python tests/prompt_system_regression_check.py
python tests/coordinate_mapping_test.py
python tests/request_runtime_control_check.py
python tests/session_context_red_check.py
python tests/verify_macos_doubleclick.py
python tests/verify_visual_agent_mvp.py

Legacy smoke test that boots the GUI:

python tests/simple_test.py

6) Command caveats

python app/app.py launches a real GUI.
The app may need screen recording, accessibility, keyboard, and mouse permissions.
Do not assume GUI flows are headless-safe.
build.py is an interactive PyInstaller script.
build.py prompts for version confirmation and may prompt about macOS signing / notarization.
build.py runs pip install -r requirements.txt inside setup().
Build output goes to dist/.
Only run packaging verification when the task really requires it.
There is no checked-in Ruff, Black, isort, mypy, pytest, or unittest config.
The only checked-in lint workflow uses Pylint.

7) Test caveats

tests/simple_test.py is a smoke-style GUI test, not a pure unit test.
It boots the app and can trigger real UI automation behavior.
Do not describe tests/simple_test.py as CI-safe headless unit coverage.
The repository now also contains multiple safe local regression scripts for prompting, runtime control, settings, coordinate mapping, and verifier behavior.
Prefer narrow targeted checks over broad GUI smoke runs.

8) Preferred verification strategy

Docs-only changes: verify the edited markdown file content and structure.
Prompting changes: run tests/prompt_system_regression_check.py and related narrow prompt/runtime checks.
Coordinate / action execution changes: run tests/coordinate_mapping_test.py and related targeted checks.
Runtime control / request boundary changes: run tests/request_runtime_control_check.py and tests/session_context_red_check.py.
GUI/runtime changes: use careful smoke testing only when the task truly requires it, because the app can control the machine.
Packaging changes: run python build.py only if the user explicitly wants packaging validation.
New tests should avoid real mouse and keyboard control whenever possible.

9) Source layout

app/app.py -> top-level app wiring and queue orchestration
app/core.py -> request lifecycle, recursive loop, interruption, local verification integration
app/interpreter.py -> turns model steps into local desktop actions
app/verifier.py -> before/after screenshot-based local step verification
app/ui.py -> Tk / ttkbootstrap UI and settings windows
app/llm.py -> LLM runtime settings sync and stable system context assembly
app/models/ -> provider-specific model adapters and model factory
app/prompting/ -> Prompt System v1 builders and schema text generation
app/agent_memory.py -> compact action / failure memory used for next-step guidance
app/session_store.py -> SQLite-backed message and execution log persistence
app/utils/settings.py -> settings validation, defaults, persistence helpers
app/utils/screen.py -> screenshot capture, grid prompt image creation, prompt image archival
app/resources/context.txt -> stable non-dynamic system rules
tests/ -> targeted regression scripts plus legacy GUI smoke test

10) Prompt System v1

The project now has a unified prompt architecture under app/prompting/.

The key rule is:

Model providers must share the same prompt semantics.
Provider adapters should only differ in message formatting and transport.
Do not reintroduce provider-specific prompt meaning unless the task explicitly requires it.

Core prompt layers:

PromptSystemContext
PromptToolSchema
PromptTaskContext
PromptExecutionTimeline
PromptRecentDetails
PromptVisualContext
PromptOutputContract

Prompt composition entrypoint:

app/prompting/builder.py

Stable prompt rules source:

app/resources/context.txt

Important prompt boundary:

Dynamic runtime state must not be pushed back into context.txt.
Do not revert to context.txt + raw request_data JSON style prompting.

11) Tool schema contract

The model-facing tool contract is now registry-driven.

Source:

app/prompting/tool_schema.py

Key classes:

ToolParameterDefinition
ToolDefinition
ToolRegistry

Rules:

Register each model-visible tool in the registry.
Generate the model-visible tool schema from the registry.
Do not hand-maintain a second free-text list of allowed tools elsewhere.
If you add a new tool, update both:
- the tool registry entry
- the runtime execution path that actually supports the tool

Current default tools include:

click
moveTo
dragTo
write
press
scroll
sleep

12) Coordinate contract

This repository now uses a strict ruler-aligned coordinate contract.

Model-facing rule:

x_percent and y_percent use the same 0-100 ruler scale shown on the prompt image.
The model should return ruler values directly, such as 31.5 and 44.2.
The model should not divide by 100 itself.

Runtime rule:

The interpreter converts model 0-100 ruler values into internal 0.0-1.0 normalized values and then maps them to logical screen pixels.

Related files:

app/resources/context.txt
app/prompting/tool_schema.py
app/prompting/visual_context.py
app/prompting/output_contract.py
app/interpreter.py

Important warning:

Do not reintroduce a mixed 0-1 vs 0-100 prompt contract.
If you change coordinate behavior, update prompt text, interpreter logic, and coordinate tests together.

13) Model / interpreter contract

The app depends on a JSON contract between model adapters and the interpreter.

Do not casually change these keys:

steps
done
function
parameters
human_readable_justification
expected_outcome

Current output rules:

The runtime is single-step; at most one executable step should be used.
done remains null until the request is complete or should stop safely.
If a task is complete, blocked, or unsafe, return steps: [] and a short done message.

If you edit interpreter.py, model.py, openai_computer_use.py, or prompt output contract code, review the whole chain.

14) Request state and history boundaries

Current request-state architecture lives primarily in app/core.py.

Important structures:

request_context -> per-request runtime state
session_history_snapshot -> structured session narrative summary at request start
step_history -> authoritative per-request step history used for timeline and recent details
agent_memory -> compact recent action / failure memory

Rules:

Do not collapse session_history_snapshot and step_history back into one raw prompt blob.
Preserve request boundary markers like request_origin.
Keep interruption / restart behavior intact when editing request flow.

15) Prompt dump debugging

Prompt text dump support exists and is intentionally optional.

Settings key:

advanced.save_prompt_text_dumps

Behavior:

Default is False.
When enabled, final prompt text is written under project-root promptdump/.
Dumped text contains prompt content, not API credentials.

Related files:

app/prompting/debug.py
app/utils/settings.py
app/ui.py

16) Import guidelines

Match the existing import style.
Prefer absolute imports rooted at the current app layout.
Existing examples:
- from core import Core
- from ui import UI
- from utils.settings import Settings
- from models.factory import ModelFactory
Do not introduce wildcard imports.
Keep import groups ordered as: standard library, third-party, local project imports.
Keep one import per line unless the surrounding file already uses a compact local pattern.

17) Formatting guidelines

Use 4 spaces for indentation.
Preserve the repository's current style; no formatter config is checked in.
Keep code explicit and readable.
Avoid clever one-liners and unnecessary abstraction.
Prefer direct conditionals over dense expressions.
Match surrounding quote style; default to single quotes when no nearby style dominates.
Preserve helpful blank lines between logical sections.

18) Type guidelines

New and modified code should continue the existing use of Python type hints.
Add parameter and return annotations for new functions.
Prefer simple concrete types such as str, dict[str, Any], and list[str].
Use Optional[...] only when None is actually part of the contract.
Avoid overly complex typing machinery.
Mirror nearby annotation style when editing an existing module.

19) Naming guidelines

Classes: PascalCase
Functions and methods: snake_case
Variables: snake_case
Constants: UPPER_SNAKE_CASE
Use descriptive names; avoid vague names like tmp, data2, or stuff.
Examples already in use: App, Core, Interpreter, ModelFactory, PromptPackage, ToolRegistry, execute_user_request.

20) Error handling guidelines

Prefer specific exceptions when practical.
Broad except Exception as e is acceptable only when useful context is logged or surfaced.
Do not silently swallow failures.
Include enough context to identify the failing step, file, JSON payload, or model response.
For background/UI flows, favor reporting through status messages or logs.
Existing patterns:
- core.py sends startup/runtime failures to status_queue
- interpreter.py prints failing command JSON and extracted parameters
- ui.py guards cross-thread UI updates via queue-based messaging

21) Threading and UI safety

Tk widget updates should stay on the main thread.
Reuse the queue-based communication pattern already present in UI.MainWindow.
Do not directly mutate Tk widgets from worker threads when the queue path already exists.
Preserve daemon-thread behavior unless there is a strong reason to change shutdown semantics.

22) Settings, paths, and secrets

Persistent settings live under ~/.open-interface/.
Use Settings from app/utils/settings.py instead of adding ad hoc config files.
Prefer Path(__file__).resolve().parent for repo-local resources.
Reuse the existing resources/ lookup patterns.
Never commit API keys or secrets.
Respect .gitignore entries such as secrets/, */secret.py, dist/, and build artifacts.
Base64-encoded API keys are still secrets.

23) Dependency and change policy

Prefer the current dependency set in requirements.txt.
Do not add packages without a clear need.
If a new dependency is necessary, document why and keep the change minimal.
Prefer small, localized edits over broad refactors.
Preserve backward-compatible behavior in desktop automation code unless the user explicitly approves a contract change.
When unsure, optimize for safety, explicitness, reversibility, and verifiability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

1) Repo summary

2) What the app does today

3) Extra agent rule files found during scan

4) Environment setup

5) Run / build / lint / test commands

6) Command caveats

7) Test caveats

8) Preferred verification strategy

9) Source layout

10) Prompt System v1

11) Tool schema contract

12) Coordinate contract

13) Model / interpreter contract

14) Request state and history boundaries

15) Prompt dump debugging

16) Import guidelines

17) Formatting guidelines

18) Type guidelines

19) Naming guidelines

20) Error handling guidelines

21) Threading and UI safety

22) Settings, paths, and secrets

23) Dependency and change policy

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

1) Repo summary

2) What the app does today

3) Extra agent rule files found during scan

4) Environment setup

5) Run / build / lint / test commands

6) Command caveats

7) Test caveats

8) Preferred verification strategy

9) Source layout

10) Prompt System v1

11) Tool schema contract

12) Coordinate contract

13) Model / interpreter contract

14) Request state and history boundaries

15) Prompt dump debugging

16) Import guidelines

17) Formatting guidelines

18) Type guidelines

19) Naming guidelines

20) Error handling guidelines

21) Threading and UI safety

22) Settings, paths, and secrets

23) Dependency and change policy