Python: Shell tool with support for local and Docker#5664
Draft
alliscode wants to merge 9 commits intomicrosoft:mainfrom
Draft
Python: Shell tool with support for local and Docker#5664alliscode wants to merge 9 commits intomicrosoft:mainfrom
alliscode wants to merge 9 commits intomicrosoft:mainfrom
Conversation
… package Introduces a safe, cross-OS local shell tool as the first citizen of a new agent-framework-tools workspace package. Supports persistent (default) and stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist, allowlist, approval gating, process-tree kill on timeout, output truncation, and audit hooks. Integrates with existing provider get_shell_tool(func=...) factories via FunctionTool kind='shell'. See docs/decisions/0026-builtin-tools-local-shell.md for the full design. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codifies what LocalShellTool does and does not defend against, and delegates the security-relevant lifecycle primitive to a battle-tested library instead of hand-rolled per-OS code. Changes: - Adopt psutil for cross-OS process-tree termination (executor + session). Replaces hand-rolled taskkill/killpg with one canonical implementation. - Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH poisoning cannot redirect us to an attacker-supplied binary. - Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail, not a security boundary. - Require acknowledge_unsafe=True to set approval_mode='never_require', making the unsafe path explicitly opt-in with a self-documenting name. - Add tests/test_security.py codifying named CVE-style cases. Defenses we DO claim are asserted; non-defenses (denylist bypasses via backslash insertion, variable expansion, interpreter escape, base64, alternative tools, PowerShell-native verbs) are documented as expected-to-pass tests so residual risk stays visible. - Add Threat Model + Confidence Strategy sections to ADR 0026. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool. Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework. Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell. Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Applies the applicable subset of bug fixes accumulated during the .NET shell-tool PR review (microsoft#5604) to the Python shell tool. A1 - Quote workdir safely in _maybe_reanchor Previously _tool.py used double-quote interpolation when emitting the cd/Set-Location prefix, which expanded $VAR, $(), and backticks in the workdir path. A workdir containing shell metacharacters could trigger arbitrary command execution before the user command ran. Replaced with single-quote escaping helpers _quote_posix and _quote_powershell that emit literal-string forms safe for both hosts. A5/A6 - Consolidate truncation to a single byte-aware helper Extracted a shared truncate_head_tail / truncate_text_head_tail helper in _truncate.py. The new implementation distributes odd caps so head receives floor(cap/2) and tail receives ceil(cap/2) bytes, matching the .NET round-9 fix and ensuring no input bytes are silently dropped on the boundary. _session.py previously truncated by Python str length while the caller passed _max_output_bytes - the unit mismatch is now gone: raw byte buffers go through truncate_head_tail and decoded text goes through truncate_text_head_tail. Unit tests added for the truncate and quote helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tool
The shell tool's docstrings and comments contained two patterns that
the .NET review pushed back on:
- Narrative framing about implementation history ("hard-won",
"we sidestep", "design inspiration: ...", competitor framework
name-drops in module docstrings).
- Overstated security guarantees ("battle-tested",
"reasonable for untrusted input", "recommended executor for any
agent that runs commands from untrusted input",
"destructive commands are blocked", "safe local shell tool",
"blocks shell injection").
Rewrites the affected docstrings and comments to describe what the
code does in neutral terms. Behaviour is unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ports the .NET ShellEnvironmentProvider as a Python ContextProvider so agents using LocalShellTool or DockerShellTool can be primed with an accurate description of the shell they're talking to (family, version, OS, working directory, and which CLIs are available). The provider runs probes through any ShellExecutor, caches the resulting snapshot, and on every before_run extends the session instructions with a markdown block describing the shell idiom to use. A failed first probe leaves the cache empty so the next call retries (no permanent poisoning). Probe failures from a narrow set of expected error types (ShellCommandError, ShellExecutionError, ShellTimeoutError, and asyncio.TimeoutError from the per-probe timeout) are recorded as None fields in the snapshot. Other exceptions propagate. Tool names are validated against ^[A-Za-z0-9._-]+$ before being interpolated into a probe command. Includes 12 unit tests covering happy path, stderr fallback, timeout handling, expected/unexpected exception paths, malicious tool name rejection, case-insensitive deduplication, retry after failure, concurrent first-callers sharing one probe, and the default and custom formatter paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…anup Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new first-party Python workspace package, agent-framework-tools, introducing a cross-platform shell execution surface (LocalShellTool) plus a container-sandboxed variant (DockerShellTool) and a context provider (ShellEnvironmentProvider) to help models emit correct shell idioms and discover available CLIs.
Changes:
- Add
agent-framework-toolspackage (shell tools, policy/denylist, truncation, process-tree kill, persistent session protocol, environment context provider). - Add unit + integration-gated tests and runnable samples for local and Docker-backed shell execution.
- Register the new package in the Python workspace (pyproject + uv lock) and update an existing OpenAI sample to use
LocalShellTool.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Adds agent-framework-tools as a workspace member and locked editable package entry. |
| python/pyproject.toml | Registers agent-framework-tools as a workspace source dependency. |
| python/samples/02-agents/providers/openai/client_with_local_shell.py | Updates sample to use LocalShellTool instead of a hand-rolled subprocess tool. |
| python/packages/tools/README.md | Documents installation, modes, safety model, and tool/provider usage. |
| python/packages/tools/LICENSE | Adds MIT license for the new tools package. |
| python/packages/tools/pyproject.toml | Defines packaging metadata, deps (incl. psutil), and test/lint/tooling config. |
| python/packages/tools/agent_framework_tools/init.py | Adds package root and version discovery. |
| python/packages/tools/agent_framework_tools/py.typed | Marks the package as typed for type checkers. |
| python/packages/tools/agent_framework_tools/shell/init.py | Exposes the public shell-tool API surface. |
| python/packages/tools/agent_framework_tools/shell/_types.py | Introduces shared types and core exceptions for shell execution. |
| python/packages/tools/agent_framework_tools/shell/_truncate.py | Implements head/tail UTF-8 byte-budget truncation helpers. |
| python/packages/tools/agent_framework_tools/shell/_policy.py | Adds allow/deny policy model and default denylist patterns. |
| python/packages/tools/agent_framework_tools/shell/_resolve.py | Implements cross-platform shell argv resolution and PowerShell detection. |
| python/packages/tools/agent_framework_tools/shell/_killtree.py | Adds cross-OS process-tree termination (psutil + fallback). |
| python/packages/tools/agent_framework_tools/shell/_executor.py | Implements stateless execution via subprocess with timeout + truncation. |
| python/packages/tools/agent_framework_tools/shell/_executor_base.py | Defines a minimal ShellExecutor protocol for pluggable backends. |
| python/packages/tools/agent_framework_tools/shell/_session.py | Implements persistent shell session using sentinel framing and reader tasks. |
| python/packages/tools/agent_framework_tools/shell/_tool.py | Adds LocalShellTool facade + agent-framework FunctionTool wiring. |
| python/packages/tools/agent_framework_tools/shell/_environment.py | Adds ShellEnvironmentProvider to probe and inject shell environment guidance. |
| python/packages/tools/agent_framework_tools/shell/_docker.py | Adds DockerShellTool and argv builders for container-sandboxed execution. |
| python/packages/tools/samples/init.py | Adds samples package marker. |
| python/packages/tools/samples/shell_openai_persistent.py | Demonstrates OpenAI usage with an approval loop and persistent local shell. |
| python/packages/tools/samples/shell_allowlist_stateless.py | Demonstrates a strict allowlist + stateless mode configuration. |
| python/packages/tools/samples/shell_with_environment_provider.py | Demonstrates using ShellEnvironmentProvider with stateless vs persistent shells. |
| python/packages/tools/tests/init.py | Adds tests package marker. |
| python/packages/tools/tests/test_shell_truncate_and_quote.py | Tests truncation helpers and quoting helpers. |
| python/packages/tools/tests/test_shell_environment_provider.py | Tests probing, formatting, caching, and concurrency behavior of environment provider. |
| python/packages/tools/tests/test_security.py | Adds security regression tests documenting denylist behavior and residual risk. |
| python/packages/tools/tests/test_policy.py | Tests default policy behavior, allowlist behavior, and custom overrides. |
| python/packages/tools/tests/test_local_shell_tool.py | Tests local shell tool modes, timeouts, policy, persistence, and concurrency. |
| python/packages/tools/tests/test_docker_shell_tool.py | Tests Docker argv builders, basic tool behavior, and docker-availability-gated integration tests. |
Copilot's findings
- Files reviewed: 27/29 changed files
- Comments generated: 6
Comment on lines
+275
to
+277
| if self._interactive_argv and "pwsh" in os.path.basename(self._interactive_argv[0]).lower(): | ||
| return f"Set-Location -LiteralPath {_quote_powershell(self._workdir)}\n{command}" | ||
| return f"cd -- {_quote_posix(self._workdir)}\n{command}" |
Comment on lines
+25
to
+26
| from agent_framework import ContextProvider, SupportsAgentRun | ||
| from agent_framework._sessions import AgentSession, SessionContext |
Comment on lines
+81
to
+85
| # Persistent reader state. The reader tasks append into these | ||
| # buffers; _run_locked scans forward from a per-call offset. | ||
| self._stdout_buf = bytearray() | ||
| self._stderr_buf = bytearray() | ||
| self._stdout_event = asyncio.Event() |
Comment on lines
+278
to
+300
| if self._container_started: | ||
| if self._mode == "persistent" and self._session is not None: | ||
| await self._session.start() | ||
| return | ||
| await self._start_container() | ||
| self._container_started = True | ||
| if self._mode == "persistent": | ||
| argv = build_exec_argv( | ||
| binary=self._binary, | ||
| container_name=self._container_name, | ||
| interactive=True, | ||
| ) | ||
| self._session = ShellSession( | ||
| argv, | ||
| workdir=None, # workdir is set on the container itself | ||
| env=None, | ||
| max_output_bytes=self._max_output_bytes, | ||
| ) | ||
| await self._session.start() | ||
|
|
||
| async def close(self) -> None: | ||
| """Stop the inner shell session and tear down the container.""" | ||
| async with self._get_lifecycle_lock(): |
Comment on lines
+169
to
+171
| assert getattr(fn, "additional_properties", {}).get("kind") == SHELL_TOOL_KIND_VALUE or \ | ||
| getattr(fn, "kind", None) == SHELL_TOOL_KIND_VALUE or \ | ||
| SHELL_TOOL_KIND_VALUE in str(getattr(fn, "_kind", "")) |
Comment on lines
+177
to
+184
| @pytest.mark.skipif(not is_docker_available(), reason="docker daemon unavailable") | ||
| async def test_docker_persistent_session_preserves_state(): | ||
| async with DockerShellTool(image="alpine:3", network="none") as shell: | ||
| r1 = await shell.run("export AF_X=hello") | ||
| assert r1.exit_code == 0 | ||
| r2 = await shell.run("echo $AF_X") | ||
| assert r2.exit_code == 0 | ||
| assert "hello" in r2.stdout |
…_model Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new built-in tools package for the Microsoft Agent Framework, focusing on a cross-platform local shell tool (
LocalShellTool) and its supporting infrastructure. It adds comprehensive documentation, licensing, and a Python package structure to support safe and extensible shell command execution, with future growth in mind.