This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is the Inference Gateway CLI - a Go-based command-line interface for managing and interacting with AI inference services. It provides interactive chat, autonomous agent capabilities, and extensive tool execution for AI models.
Key Technology Stack:
- Language: Go 1.26+
- UI Framework: Bubble Tea (TUI framework)
- Gateway Integration: Via
inference-gateway/sdkandinference-gateway/adk - Storage Backends: JSONL (default), SQLite, PostgreSQL, Redis, In-memory
- Build Tool: Task (Taskfile)
- Environment: Flox (development environment manager)
# Build the binary
task build
# Run all tests
task test
# Run tests with verbose output
task test:verbose
# Run tests with coverage
task test:coverage
# Format code
task fmt
# Run linter
task lint# Run locally without building
task run CLI_ARGS="chat"
task run CLI_ARGS="status"
task run CLI_ARGS="version"
# Or after building
./infer chat
./infer agent "task description"
./infer status# Download Go modules
task mod:download
# Install pre-commit hooks
task precommit:install
# Run pre-commit on all files
task precommit:run# Regenerate all mocks (uses counterfeiter)
task mocks:generate
# Clean generated mocks
task mocks:clean# Build for current platform
task release:build
# Build macOS binary
task release:build:darwin
# Build portable Linux binary (via Docker)
task release:build:linux
# Build and push container images
task container:build
task container:pushcmd/ # CLI commands (cobra-based)
├── agent.go # Autonomous agent command
├── channels.go # Channel listener daemon command
├── chat.go # Interactive chat command
├── config.go # Configuration management commands
├── agents.go # A2A agent management
└── root.go # Root command and global flags
internal/
├── app/ # Application initialization
├── container/ # Dependency injection container
├── domain/ # Domain interfaces and models
│ ├── interfaces.go # Core service interfaces
│ └── filewriter/ # File writing domain logic
├── handlers/ # Message/event handlers
│ ├── chat_handler.go # Main chat orchestrator
│ ├── chat_message_processor.go # Message processing logic
│ └── chat_shortcut_handler.go # Shortcut command handling
├── services/ # Business logic implementations
│ ├── agent.go # Agent service
│ ├── conversation.go # Conversation management
│ ├── conversation_optimizer.go # Conversation compaction
│ ├── approval_policy.go # Tool approval logic
│ ├── tools/ # Tool implementations
│ │ ├── registry.go # Tool registry
│ │ ├── bash.go # Bash execution
│ │ ├── read.go, write.go # File I/O
│ │ ├── edit.go, multiedit.go # File editing
│ │ ├── web_search.go # Web search
│ │ └── mcp_tool.go # MCP integration
│ ├── channels/ # Pluggable messaging channels
│ │ └── telegram.go # Telegram Bot API channel
│ └── filewriter/ # File writing services
├── infra/ # Infrastructure layer
│ ├── storage/ # Conversation storage backends
│ │ ├── factory.go # Storage factory
│ │ ├── sqlite.go # SQLite implementation
│ │ ├── postgres.go # PostgreSQL implementation
│ │ ├── redis.go # Redis implementation
│ │ └── memory.go # In-memory implementation
│ └── adapters/ # External service adapters
├── ui/ # Terminal UI components
│ ├── components/ # Reusable UI components
│ ├── styles/ # Theme and styling
│ └── keybinding/ # Keyboard handling
├── shortcuts/ # Shortcut system
│ └── registry.go # Shortcut management
├── web/ # Web terminal interface
└── utils/ # Shared utilities
config/ # Configuration structs
└── config.go # Main config definition
The application uses a service container pattern (internal/container/container.go) for dependency management.
All services are initialized once and injected where needed:
- Configuration service
- Model service
- Agent service
- Tool service
- Conversation repository
- Storage backends
- MCP manager
Tools are self-contained modules that implement the domain.Tool interface:
- Tool Interface (
internal/domain/interfaces.go): DefinesExecute(),Definition(),Validate(),IsEnabled() - Tool Registry (
internal/services/tools/registry.go): Manages tool registration and lookup - Tool Implementations (
internal/services/tools/*.go): Individual tool logic - Approval System (
internal/services/approval_policy.go): Handles user approval for sensitive operations
- User input →
ChatHandler.Handle()→ routes to appropriate handler ChatMessageProcessorprocesses user message- Tool calls →
ToolService.Execute()→ Tool registry → Individual tool - Tool approval (if required) → Approval UI → Execute or reject
- LLM response → Stream to UI via Bubble Tea messages
- Conversation saved to storage backend
- Chat Mode: Interactive TUI with real-time user input and approval
- Agent Mode: Autonomous background execution with minimal user interaction
- Both use the same
AgentServicebut different handlers and UI flows
The conversation storage uses a factory pattern with pluggable backends:
- JSONL: Default, file-based, human-readable, zero-config
- SQLite: SQL-based, file-based, structured queries
- PostgreSQL: Production-grade, concurrent access
- Redis: Fast, in-memory, distributed setups
- Memory: Testing and ephemeral sessions
Backend selection is config-driven via config.yaml or environment variables.
ChatHandler Responsibilities:
- Orchestrates message flow between user, LLM, and tools
- Manages conversation state
- Routes shortcuts to
ChatShortcutHandler - Handles tool approval workflow
- Manages background bash shells
- Integrates with message queue for async operations
Key Handler Methods:
Handle(): Main entry point, routes messageshandleUserMessage(): Processes user inputhandleToolCalls(): Executes tool requests from LLMhandleShortcut(): Delegates to shortcut handler
When adding a new tool:
- Create tool file:
internal/services/tools/your_tool.go - Implement
domain.Toolinterface:Definition(): Returns SDK tool definition with JSON schemaExecute(ctx, args): Tool execution logicValidate(args): Parameter validationIsEnabled(): Check if tool is enabled
- Register tool: Add to
registry.goinregisterTools() - Add config: Update
config/config.goif tool needs configuration - Write tests: Create
your_tool_test.go - Update approval policy: If tool needs approval, configure in
approval_policy.go
Tool Parameter Extraction:
Use ParameterExtractor for type-safe parameter extraction:
extractor := tools.NewParameterExtractor(args)
filePath, err := extractor.GetString("file_path")
lineNum, err := extractor.GetInt("line_number")Important Tool Conventions:
- Always respect
ctxfor cancellation - Return
*domain.ToolExecutionResultwith meaningful output - Use
configto check if tool is enabled - File operations should use absolute paths
- Validate all user inputs before execution
The CLI uses a 2-layer configuration system:
- Project config:
.infer/config.yaml(project-specific) - Userspace config:
~/.infer/config.yaml(user defaults) - Environment variables:
INFER_*prefix (highest priority) - Command flags: Override config values
Key Config Sections:
gateway.*: Gateway connection settingsagent.*: Agent behavior (model, max_turns, system_prompt, custom_instructions)tools.*: Tool-specific configurationchat.*: Chat UI settings (theme, keybindings, status bar)web.*: Web terminal settingspricing.*: Cost tracking configurationcomputer_use.*: Computer use tool settings
Environment variable format: INFER_<PATH> (dots become underscores)
Example: agent.model → INFER_AGENT_MODEL
Customisable LLM prompts live in .infer/prompts.yaml (loaded
separately from config.yaml). Top-level keys: agent, git,
conversation, init, tools. Tool descriptions surfaced to the LLM
are configurable under tools.<ToolName>.description — e.g.
tools.Bash.description, tools.Read.description. MCP tool
descriptions are not configurable here (they come from the MCP server
at runtime). Any field left empty falls back to the in-code default in
config.DefaultPromptsConfig. Env-var overrides use the form
INFER_PROMPTS_TOOLS_<UPPER_SNAKE_NAME>_DESCRIPTION (e.g.
INFER_PROMPTS_TOOLS_BASH_DESCRIPTION,
INFER_PROMPTS_TOOLS_A2A_SUBMIT_TASK_DESCRIPTION).
The CLI automatically enhances the model's context with project awareness to reduce confusion and improve accuracy.
When operating in a git repository, the model receives:
- Repository name (extracted from remote URL, e.g., "inference-gateway/cli")
- Current branch (e.g., "main", "feature/xyz")
- Main branch name (detected as "main" or "master")
- Recent commits (last 5 commits with hashes and messages)
This context is automatically injected into the system prompt on every request. The git context is cached and refreshed every N turns (configurable) to balance performance with up-to-date information.
The model receives the current working directory path, helping it understand:
- Where files should be read from or written to
- Which directory commands will execute in
- Project location context
- First prompt: +50-100ms (git command execution)
- Subsequent prompts: <1ms (cached)
- Token overhead: ~100-300 tokens (depends on git history)
- Git refresh: Every 10 turns by default (configurable)
Control via .infer/config.yaml:
agent:
context:
git_context_enabled: true # Enable git repository context
working_dir_enabled: true # Enable working directory context
git_context_refresh_turns: 10 # Refresh git context every N turnsOr via environment variables:
INFER_AGENT_CONTEXT_GIT_CONTEXT_ENABLED=true
INFER_AGENT_CONTEXT_WORKING_DIR_ENABLED=true
INFER_AGENT_CONTEXT_GIT_CONTEXT_REFRESH_TURNS=10Before:
- Model confused about repository name ("inference-gateway" vs "inference-gateway/cli" vs "inference-gateway/infer")
- No awareness of current branch or git state
- Unclear about working directory
After:
- Model knows exact repository:
inference-gateway/cli - Aware of current branch and recent commits
- Understands working directory context
- Reduced need for clarifying questions
- Location:
internal/services/agent_utils.go - Context builders:
buildGitContextInfo(),buildWorkingDirectoryInfo() - Git helpers:
isGitRepository(),getGitRepositoryName(),getGitBranch(),getGitMainBranch(),getRecentCommits() - Caching: Thread-safe caching via
sync.RWMutexinAgentServiceImpl - Error handling: All git operations fail gracefully (log debug, return empty string)
Shortcuts are YAML-defined commands stored in .infer/shortcuts/:
- Built-in shortcuts:
/clear,/exit,/help,/switch,/theme,/cost - Git shortcuts:
/git status,/git commit,/git push - SCM shortcuts:
/scm issues,/scm pr-create - Custom shortcuts: User-defined in project
Shortcuts support:
- Subcommands (e.g.,
/git commit) - AI-powered snippets (LLM-generated content)
- Command chaining
- Dynamic context injection
Test Organization:
- Unit tests:
*_test.gofiles alongside implementation - Mocks:
tests/mocks/(generated via counterfeiter)
Running Specific Tests:
# Test specific package
go test ./internal/services/tools
# Test specific function
go test ./internal/services/tools -run TestBashTool
# With race detector
go test -race ./...The CLI supports MCP servers for extended tool capabilities:
- MCP manager:
internal/services/mcp_manager.go - MCP tools:
internal/services/tools/mcp_tool.go - Configuration:
config.Tools.MCPServers
MCP servers are configured in .infer/config.yaml and tools are dynamically registered at runtime.
A2A enables agents to delegate tasks to specialized agents:
- Agent registry:
~/.infer/agents.yaml - A2A tools:
A2A_SubmitTask,A2A_QueryAgent,A2A_QueryTask - Agent polling: Background monitor for task status
- Configuration: Via
infer agentscommands
Channels provide pluggable messaging transports (Telegram, WhatsApp, etc.)
for remote-controlling the agent from external platforms. The
infer channels-manager command runs as a standalone daemon, completely
decoupled from the agent. Each incoming message triggers
infer agent --session-id <id> as a subprocess.
- Channels command:
cmd/channels.go - Channel Manager:
internal/services/channel_manager.go - Telegram channel:
internal/services/channels/telegram.go - Domain types:
Channel,InboundMessage,OutboundMessageininternal/domain/interfaces.go - Configuration:
config.Channelsinconfig/config.go
Channels are configured in .infer/config.yaml under the channels key.
Each channel has its own allowlist for security.
See docs/channels.md for full documentation.
The channels-manager daemon also hosts the scheduler service when
tools.schedule.enabled: true — see Scheduling (Cron-driven Tasks)
below for the full design.
When channels.require_approval is true (default), the channel manager
enables interactive tool approval via stdin/stdout IPC with the agent subprocess:
- Channel manager passes
--require-approvaltoinfer agent - Agent emits
ApprovalRequestJSON on stdout, blocks reading stdin - Channel manager detects request, sends approval prompt to user
- User replies "yes"/"no"; reply intercepted in
routeInbound()beforehandleMessage()to avoid sender mutex deadlock - Channel manager writes
ApprovalResponseJSON to agent stdin - 5-minute timeout auto-rejects if no reply
- IPC types:
internal/domain/ipc.go(ApprovalRequest,ApprovalResponse) - Agent side:
cmd/agent.go(executeToolCallsWithApproval,readApprovalResponses,outputApprovalRequest) - Channel manager side:
internal/services/channel_manager.go(handleApprovalRequest,parseApprovalRequest,isApprovalReply) - Reuses existing
tools.*.require_approvalandtools.safety.require_approvalconfig - Read-only tools (Tree, Read, Grep) default to
require_approval: false
- Implement
domain.Channelinterface ininternal/services/channels/ - Add config type to
config/config.go - Register in
registerChannels()incmd/channels.go - Add allowlist case in
channel_manager.goisAllowedUser()
The Schedule tool lets the LLM create recurring or one-off jobs that fire
on a cron schedule and deliver their output back through the messaging
channel that triggered the current session (e.g. Telegram). The scheduler
runs inside the channels-manager daemon — there is no separate process —
and is cross-platform (uses robfig/cron/v3, not system crontab).
- Schedule tool:
internal/agent/tools/schedule.go - Scheduler service:
internal/services/scheduler/scheduler.go - YAML store:
internal/services/scheduler/store.go - Domain types:
ScheduledJob,SchedulerServiceininternal/domain/scheduler.go - Session-ID parser:
domain.ParseChannelSessionIDininternal/domain/session.go - Wiring:
cmd/channels.gostartScheduler()constructs and lifecycles the service - Configuration:
config.Tools.Schedule(ScheduleToolConfig) inconfig/config.go
See docs/scheduling.md for the user-facing guide.
┌─ infer channels-manager (daemon) ─────────────────────────┐
│ ChannelManagerService │
│ ├─ inbound msgs → spawn `infer agent` │
│ └─ SchedulerService (when tools.schedule.enabled) │
│ ├─ robfig/cron/v3 scheduler │
│ ├─ fsnotify watcher on ~/.infer/schedules/ │
│ └─ on fire: spawn `infer agent --session-id <uuid>` │
│ capture stdout → channel.Send(...) │
└────────────────────────────────────────────────────────────┘
▲ ▲
│ writes YAML │ reads YAML on startup
│ │ + fsnotify reload
┌───────────┴─────────┐ ┌─────────┴──────────────┐
│ Schedule tool │ create/del │ ~/.infer/schedules/ │
│ (runs in any agent) │ ──────────► │ <job-id>.yaml │
└─────────────────────┘ └────────────────────────┘
Key properties:
- Tool-only file I/O. The
Scheduletool never talks directly to the daemon — it just writes YAML. The daemon's fsnotify watcher (scheduler.startWatcher) picks up changes within ~150ms (debounced) and registers/unregisters cron entries. - Fresh session per fire. Each scheduled run gets a new UUID session ID; no context carries between fires (acceptance criterion of issue #418).
- Daemon-bound execution. Jobs only fire while
infer channels-manageris running. If the daemon is down, the YAML stays on disk and resumes on next startup. - One-off jobs. When
RunOnce: trueon the job YAML, the scheduler deletes the file after the first fire (regardless of delivery success). Used for "remind me at 6pm today"-style requests.
The Schedule tool does not accept channel or recipient_id parameters
from the LLM. They are derived deterministically from the agent's session ID
via domain.ParseChannelSessionID. Channels-manager session IDs are
formatted channel-<name>-<sender_id> (see channel_manager.go:177), so
parsing is unambiguous.
Wiring chain:
cmd/agent.go executeToolCall()injects the agent'ssessionIDinto the tool-call context viadomain.WithSessionID.Schedule.execCreatecallsresolveRouting(ctx)which reads the session ID withdomain.GetSessionID(ctx)and parses it.- If the session is not channel-formatted (e.g. chat-mode or a generic agent run), the tool returns a clear error — it cannot guess where to deliver.
This means the LLM literally cannot route to the wrong recipient.
~/.infer/schedules/<uuid>.yaml:
id: 01HG7K2N3M4P5Q6R7S8T9V0W1X
name: Daily morning quote
cron_expression: "0 8 * * *" # standard 5-field crontab or @every <duration>
prompt: |
Find an inspiring quote and respond with quote + author.
channel: telegram
recipient_id: "12345"
model: "" # empty = use cfg.Agent.Model
run_once: false # true → deleted after first fire
created_at: 2026-04-25T10:30:00Z
updated_at: 2026-04-25T10:30:00Z
last_run: 2026-04-26T08:00:01Z
last_error: "" # set when delivery failsSave is atomic (write to <id>.yaml.tmp, then os.Rename) so the
fsnotify watcher never sees half-written files.
Cron expressions are interpreted in time.Local, which honours the TZ
environment variable. The binary imports _ "time/tzdata" in main.go,
embedding the IANA zone DB so TZ=Europe/Berlin works on minimal container
images that don't ship /usr/share/zoneinfo.
- Add the op to the
scheduleOp*constants ininternal/agent/tools/schedule.go. - Add an
enumentry in theoperationparameter and acaseinExecute()'s switch. - Implement
execMyOp(ctx, args, store, start)returning a*ScheduleToolResult. UserequireString/optionalString/optionalBoolfor arg extraction. - Add validation in
Validate()'s switch. - Update the tool description + table in
docs/scheduling.mdanddocs/tools-reference.md.
- Disabled by default (
ScheduleToolConfig.Enabled = false); enable explicitly viatools.schedule.enabled: trueorINFER_TOOLS_SCHEDULE_ENABLED=true. - Requires approval by default (
ScheduleToolConfig.RequireApproval = ptr(true)); theIsApprovalRequired("Schedule")switch case inconfig/config.gohonours this. - Defaults are registered with viper via four
v.SetDefault("tools.schedule.*", ...)calls incmd/root.go— without those, viper unmarshals an empty config and the defaults function's values are ignored.
The Heartbeat wakes the agent on a fixed interval to check for
pending work. It is a peer of the scheduler — both run inside the
infer channels-manager daemon, both spawn infer agent
subprocesses, but heartbeat is a single global tick (vs. many
user-defined cron jobs) and logs output (vs. routing to a channel).
Disabled by default.
- Config struct:
config.HeartbeatConfiginconfig/heartbeat.go - Config file:
~/.infer/heartbeat.yaml(separate file, mirrors channels.yaml;yaml:"-"onConfig.Heartbeat). - System prompt:
cfg.Prompts.Agent.SystemPromptHeartbeatinprompts.yaml— separate fromsystem_prompt/system_prompt_plan. - Service:
internal/services/heartbeat/heartbeat.go(ServicewithStart(ctx)/Stop(ctx), ticker-driven, no cron). - Daemon wiring:
cmd/channels.gostartHeartbeat()next tostartScheduler(). - Init wiring:
cmd/init.gocreateHeartbeatConfigFile(). - Env vars:
INFER_HEARTBEAT_*applied viaapplyHeartbeatEnvOverridesincmd/config.go.
┌─ infer channels-manager (daemon) ─────────────────────────┐
│ ChannelManagerService (channels — optional) │
│ SchedulerService (cron jobs — optional) │
│ HeartbeatService │
│ ├─ time.Ticker(interval) │
│ └─ on tick: spawn `infer agent --heartbeat │
│ --session-id <uuid> <prompt>` │
│ log stdout │
└────────────────────────────────────────────────────────────┘
Key properties:
- Off by default.
Heartbeat.Enabled = falseinDefaultHeartbeatConfig(). - Daemon gate is relaxed.
infer channels-managerboots if any of channels / scheduler / heartbeat is enabled. Heartbeat alone is a valid run mode. - Fresh session per fire. UUID-format session ID (not channel
prefixed); the Schedule tool's
resolveRoutingwill refuse to operate from a heartbeat run, which is intentional — heartbeat should not directly create scheduled jobs without explicit channel context. - Overlap guard.
atomic.Int32flag suppresses concurrent ticks when the agent run takes longer thaninterval. Logs a warning when skipped. - System prompt selection.
infer agent --heartbeat(cmd flag added incmd/agent.go) swapscfg.Prompts.Agent.SystemPromptforcfg.Prompts.Agent.SystemPromptHeartbeatbefore the service container is built. The agent service stays oblivious to the new mode. - Output. Agent stdout is logged via the standard logger. No channel routing — if the user wants a channel notification, the agent itself uses its tools to send one.
See docs/heartbeat.md for the user-facing guide.
Plan mode (AgentModePlan in internal/domain/state.go) is a read-only
operating mode the user enters via Shift+Tab in the chat TUI. The model
gets Read/Grep/Tree/TodoWrite plus the RequestPlanApproval tool
and is otherwise blocked from any mutating tools (enforced in
internal/services/tools.go::FilterToolsForMode).
When the model calls RequestPlanApproval, the tool persists the plan as
a Markdown file under <configDir>/plans/<YYYY-MM-DD-HHMMSS>-<slug>.md
(atomic write: .tmp → os.Rename). The plan body must follow a fixed
8-section H2 template (Context, Files to Modify, Current Code, Changes,
Performance Impact, Critical Files, Edge Cases, Verification) — see
config/prompts.go::DefaultPromptsConfig for the prompt that pins this
contract.
- Tool:
internal/agent/tools/request_plan_approval.go - System prompt:
config/prompts.go(agent.system_prompt_plan) - Approval event flow:
internal/agent/agent.go→PlanApprovalRequestedEvent→internal/handlers/chat_handler.goHandlePlanApprovalRequestedEvent/HandlePlanApprovalResponseEvent - UI state:
domain.PlanApprovalUIState,ViewStatePlanApproval
Rejected plans stay on disk as an audit trail — by design.
See docs/plan-mode.md for the full user-facing guide.
When models use extended thinking (reasoning), their internal thought process is displayed as collapsible blocks above responses.
- Data Storage: Thinking content is stored in
ConversationEntry.ThinkingContentfield - Event Flow: Reasoning content flows through
StreamingContentEvent.ReasoningContentduring streaming - Rendering: Thinking blocks are rendered before assistant message content in
renderStandardEntry()andrenderAssistantWithToolCalls() - Display State: Collapsed by default, showing first sentence with ellipsis
- Styling: Rendered using dim color (theme-aware) with 💭 icon
- Expansion: Toggled via keybinding (configurable as
display_toggle_thinking, defaults toctrl+k)
internal/domain/interfaces.go:ConversationEntry.ThinkingContentfieldinternal/domain/ui_events.go:StreamingContentEvent.ReasoningContentfieldinternal/ui/components/conversation_view.go: Rendering logic and expansion stateconfig/keybindings.go: Keybinding definitioninternal/ui/keybinding/actions.go: Action handler registration
- Toggle thinking block expansion/collapse using the configured keybinding (default:
ctrl+k) - Default state: collapsed (first sentence visible)
- Expanded state: full thinking content with word wrapping
- Keybinding can be customized via
chat.keybindings.bindings.display_toggle_thinkingin config
This project uses Conventional Commits:
<type>[optional scope]: <description>
[optional body]
[optional footer]
Types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
Breaking changes: Add ! after type (e.g., feat!:) or footer BREAKING CHANGE:
Pre-commit hooks automatically validate commit messages.
- Make changes following Go best practices
- Run quality checks:
task precommit:run(runs formatting, linting, validation) - Test thoroughly:
task test - Commit with conventional commit message
- Pre-commit hooks run automatically on commit
- Push and create PR
Release Process:
Automated via semantic-release on main branch:
- Commit types determine version bumps
- Binaries built for macOS (Intel/ARM64) and Linux (AMD64/ARM64)
- GitHub releases created automatically with changelogs
- No CGO: Project uses pure Go dependencies for portability
- Flox environment: Use
flox activatefor consistent dev environment - Binary name: Built as
infer(notcli) - Gateway dependency: CLI requires Inference Gateway (auto-managed in Docker/binary mode)
- Storage migrations: SQLite and PostgreSQL use automatic schema migrations
- Tool safety: File modification tools require user approval by default
- Context limits: Conversation optimizer handles token limits automatically