This file gives coding agents the real commands, architecture boundaries, and conventions for this repository. It is based on the checked-in files and the current implementation, not old assumptions.
- Project type: Python desktop automation app with Tk / ttkbootstrap UI, screenshot-driven visual prompting, local verification, and multiple LLM providers.
- Main entrypoint:
app/app.py - Build script:
build.py - Primary source root:
app/ - Test directory:
tests/ - Local Python version:
.python-version->3.12.8 - CI lint workflow:
.github/workflows/pylint.yml - Current architecture style: single-step visual agent loop with Prompt System v1
- Accepts a natural-language user goal in the desktop UI.
- Captures a screenshot with visible rulers / grid.
- Builds a structured prompt for the selected model provider.
- Receives a single next-step JSON action from the model.
- Executes the step locally through the interpreter.
- Optionally performs local post-action verification using before/after screenshots.
- Repeats until the model returns
done, the request is interrupted, or the runtime stops on failure.
The current system is not a multi-step batch planner. It is a strict single-step closed loop.
AGENTS.mdat repo root: present (this file).cursorrules: not found.cursor/rules/: not found.github/copilot-instructions.md: not found- Do not invent extra Cursor or Copilot rules for this repository.
From repository root:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txtOptional lint dependency:
python -m pip install pylintWindows activation:
.venv\Scripts\activateRun the app:
python app/app.pyBuild package:
python build.pyLint all tracked Python files:
pylint $(git ls-files '*.py')Lint a single file:
pylint app/core.pySafe local regression checks commonly used in this repo:
python tests/prompt_system_regression_check.py
python tests/coordinate_mapping_test.py
python tests/request_runtime_control_check.py
python tests/session_context_red_check.py
python tests/verify_macos_doubleclick.py
python tests/verify_visual_agent_mvp.pyLegacy smoke test that boots the GUI:
python tests/simple_test.pypython app/app.pylaunches a real GUI.- The app may need screen recording, accessibility, keyboard, and mouse permissions.
- Do not assume GUI flows are headless-safe.
build.pyis an interactive PyInstaller script.build.pyprompts for version confirmation and may prompt about macOS signing / notarization.build.pyrunspip install -r requirements.txtinsidesetup().- Build output goes to
dist/. - Only run packaging verification when the task really requires it.
- There is no checked-in Ruff, Black, isort, mypy, pytest, or unittest config.
- The only checked-in lint workflow uses Pylint.
tests/simple_test.pyis a smoke-style GUI test, not a pure unit test.- It boots the app and can trigger real UI automation behavior.
- Do not describe
tests/simple_test.pyas CI-safe headless unit coverage. - The repository now also contains multiple safe local regression scripts for prompting, runtime control, settings, coordinate mapping, and verifier behavior.
- Prefer narrow targeted checks over broad GUI smoke runs.
- Docs-only changes: verify the edited markdown file content and structure.
- Prompting changes: run
tests/prompt_system_regression_check.pyand related narrow prompt/runtime checks. - Coordinate / action execution changes: run
tests/coordinate_mapping_test.pyand related targeted checks. - Runtime control / request boundary changes: run
tests/request_runtime_control_check.pyandtests/session_context_red_check.py. - GUI/runtime changes: use careful smoke testing only when the task truly requires it, because the app can control the machine.
- Packaging changes: run
python build.pyonly if the user explicitly wants packaging validation. - New tests should avoid real mouse and keyboard control whenever possible.
app/app.py-> top-level app wiring and queue orchestrationapp/core.py-> request lifecycle, recursive loop, interruption, local verification integrationapp/interpreter.py-> turns model steps into local desktop actionsapp/verifier.py-> before/after screenshot-based local step verificationapp/ui.py-> Tk / ttkbootstrap UI and settings windowsapp/llm.py-> LLM runtime settings sync and stable system context assemblyapp/models/-> provider-specific model adapters and model factoryapp/prompting/-> Prompt System v1 builders and schema text generationapp/agent_memory.py-> compact action / failure memory used for next-step guidanceapp/session_store.py-> SQLite-backed message and execution log persistenceapp/utils/settings.py-> settings validation, defaults, persistence helpersapp/utils/screen.py-> screenshot capture, grid prompt image creation, prompt image archivalapp/resources/context.txt-> stable non-dynamic system rulestests/-> targeted regression scripts plus legacy GUI smoke test
The project now has a unified prompt architecture under app/prompting/.
The key rule is:
- Model providers must share the same prompt semantics.
- Provider adapters should only differ in message formatting and transport.
- Do not reintroduce provider-specific prompt meaning unless the task explicitly requires it.
Core prompt layers:
PromptSystemContextPromptToolSchemaPromptTaskContextPromptExecutionTimelinePromptRecentDetailsPromptVisualContextPromptOutputContract
Prompt composition entrypoint:
app/prompting/builder.py
Stable prompt rules source:
app/resources/context.txt
Important prompt boundary:
- Dynamic runtime state must not be pushed back into
context.txt. - Do not revert to
context.txt + raw request_data JSONstyle prompting.
The model-facing tool contract is now registry-driven.
Source:
app/prompting/tool_schema.py
Key classes:
ToolParameterDefinitionToolDefinitionToolRegistry
Rules:
- Register each model-visible tool in the registry.
- Generate the model-visible tool schema from the registry.
- Do not hand-maintain a second free-text list of allowed tools elsewhere.
- If you add a new tool, update both:
- the tool registry entry
- the runtime execution path that actually supports the tool
Current default tools include:
clickmoveTodragTowritepressscrollsleep
This repository now uses a strict ruler-aligned coordinate contract.
Model-facing rule:
x_percentandy_percentuse the same0-100ruler scale shown on the prompt image.- The model should return ruler values directly, such as
31.5and44.2. - The model should not divide by
100itself.
Runtime rule:
- The interpreter converts model
0-100ruler values into internal0.0-1.0normalized values and then maps them to logical screen pixels.
Related files:
app/resources/context.txtapp/prompting/tool_schema.pyapp/prompting/visual_context.pyapp/prompting/output_contract.pyapp/interpreter.py
Important warning:
- Do not reintroduce a mixed
0-1vs0-100prompt contract. - If you change coordinate behavior, update prompt text, interpreter logic, and coordinate tests together.
The app depends on a JSON contract between model adapters and the interpreter.
Do not casually change these keys:
stepsdonefunctionparametershuman_readable_justificationexpected_outcome
Current output rules:
- The runtime is single-step; at most one executable step should be used.
doneremainsnulluntil the request is complete or should stop safely.- If a task is complete, blocked, or unsafe, return
steps: []and a shortdonemessage.
If you edit interpreter.py, model.py, openai_computer_use.py, or prompt output contract code, review the whole chain.
Current request-state architecture lives primarily in app/core.py.
Important structures:
request_context-> per-request runtime statesession_history_snapshot-> structured session narrative summary at request startstep_history-> authoritative per-request step history used for timeline and recent detailsagent_memory-> compact recent action / failure memory
Rules:
- Do not collapse
session_history_snapshotandstep_historyback into one raw prompt blob. - Preserve request boundary markers like
request_origin. - Keep interruption / restart behavior intact when editing request flow.
Prompt text dump support exists and is intentionally optional.
Settings key:
advanced.save_prompt_text_dumps
Behavior:
- Default is
False. - When enabled, final prompt text is written under project-root
promptdump/. - Dumped text contains prompt content, not API credentials.
Related files:
app/prompting/debug.pyapp/utils/settings.pyapp/ui.py
- Match the existing import style.
- Prefer absolute imports rooted at the current app layout.
- Existing examples:
from core import Corefrom ui import UIfrom utils.settings import Settingsfrom models.factory import ModelFactory
- Do not introduce wildcard imports.
- Keep import groups ordered as: standard library, third-party, local project imports.
- Keep one import per line unless the surrounding file already uses a compact local pattern.
- Use 4 spaces for indentation.
- Preserve the repository's current style; no formatter config is checked in.
- Keep code explicit and readable.
- Avoid clever one-liners and unnecessary abstraction.
- Prefer direct conditionals over dense expressions.
- Match surrounding quote style; default to single quotes when no nearby style dominates.
- Preserve helpful blank lines between logical sections.
- New and modified code should continue the existing use of Python type hints.
- Add parameter and return annotations for new functions.
- Prefer simple concrete types such as
str,dict[str, Any], andlist[str]. - Use
Optional[...]only whenNoneis actually part of the contract. - Avoid overly complex typing machinery.
- Mirror nearby annotation style when editing an existing module.
- Classes:
PascalCase - Functions and methods:
snake_case - Variables:
snake_case - Constants:
UPPER_SNAKE_CASE - Use descriptive names; avoid vague names like
tmp,data2, orstuff. - Examples already in use:
App,Core,Interpreter,ModelFactory,PromptPackage,ToolRegistry,execute_user_request.
- Prefer specific exceptions when practical.
- Broad
except Exception as eis acceptable only when useful context is logged or surfaced. - Do not silently swallow failures.
- Include enough context to identify the failing step, file, JSON payload, or model response.
- For background/UI flows, favor reporting through status messages or logs.
- Existing patterns:
core.pysends startup/runtime failures tostatus_queueinterpreter.pyprints failing command JSON and extracted parametersui.pyguards cross-thread UI updates via queue-based messaging
- Tk widget updates should stay on the main thread.
- Reuse the queue-based communication pattern already present in
UI.MainWindow. - Do not directly mutate Tk widgets from worker threads when the queue path already exists.
- Preserve daemon-thread behavior unless there is a strong reason to change shutdown semantics.
- Persistent settings live under
~/.open-interface/. - Use
Settingsfromapp/utils/settings.pyinstead of adding ad hoc config files. - Prefer
Path(__file__).resolve().parentfor repo-local resources. - Reuse the existing
resources/lookup patterns. - Never commit API keys or secrets.
- Respect
.gitignoreentries such assecrets/,*/secret.py,dist/, and build artifacts. - Base64-encoded API keys are still secrets.
- Prefer the current dependency set in
requirements.txt. - Do not add packages without a clear need.
- If a new dependency is necessary, document why and keep the change minimal.
- Prefer small, localized edits over broad refactors.
- Preserve backward-compatible behavior in desktop automation code unless the user explicitly approves a contract change.
- When unsure, optimize for safety, explicitness, reversibility, and verifiability.