Skip to content

Conversation

@dmytrostruk
Copy link
Member

Motivation and Context

This PR adds shell command execution capabilities to the Agent Framework through a new ShellTool abstraction and LocalShellExecutor implementation.

  • Added ShellTool abstraction for shell command execution with configurable security policies (Microsoft.Agents.AI.Abstractions)
  • Added LocalShellExecutor (Microsoft.Agents.AI.Shell.Local)
  • Security controls: privilege escalation blocking, dangerous pattern detection, path validation
  • Support for allowlist/denylist patterns
  • Configurable timeouts, output truncation, and working directory
  • AsAIFunction conversion for use with existing AI agents

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@dmytrostruk dmytrostruk self-assigned this Jan 22, 2026
Copilot AI review requested due to automatic review settings January 22, 2026 08:17
@markwallace-microsoft markwallace-microsoft added the documentation Improvements or additions to documentation label Jan 22, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds shell command execution capabilities to the Agent Framework through a new ShellTool abstraction and LocalShellExecutor implementation. The implementation includes comprehensive security controls for privilege escalation blocking, dangerous pattern detection, path validation, and configurable allow/deny lists.

Changes:

  • Added ShellTool abstraction with security policies (Microsoft.Agents.AI.Abstractions)
  • Added LocalShellExecutor implementation for local shell execution (Microsoft.Agents.AI.Shell.Local)
  • Added comprehensive unit tests for ShellTool security validation and integration tests for LocalShellExecutor
  • Added sample demonstrating ShellTool usage with human-in-the-loop approval (Agent_Step21_ShellTool)

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Microsoft.Agents.AI.Shell.Local/LocalShellExecutor.cs Implements local shell command execution with timeout, output truncation, and cross-platform support
Microsoft.Agents.AI.Shell.Local/Microsoft.Agents.AI.Shell.Local.csproj Project configuration for the LocalShellExecutor implementation
Microsoft.Agents.AI.Abstractions/Shell/*.cs Core abstractions including ShellTool, ShellExecutor, ShellToolOptions, and content types
Microsoft.Agents.AI.Shell.Local.IntegrationTests/*.cs Integration tests for LocalShellExecutor covering various execution scenarios
Microsoft.Agents.AI.Abstractions.UnitTests/Shell/*.cs Unit tests for ShellTool security validation and options configuration
Agent_Step21_ShellTool/* Sample demonstrating ShellTool usage with security configuration and approval workflow
agent-framework-dotnet.slnx Solution file updated to include new projects
README.md Updated to reference the new ShellTool sample
Comments suppressed due to low confidence (1)

dotnet/tests/Microsoft.Agents.AI.Shell.Local.IntegrationTests/LocalShellExecutorTests.cs:89

  • This condition is redundant because both branches assign the same array of commands. This appears to be a copy-paste error where the Windows and Unix commands were supposed to be different but ended up being identical.
        string[] commands = RuntimeInformation.IsOSPlatform(OSPlatform.Windows)
            ? ["echo first", "echo second", "echo third"]
            : ["echo first", "echo second", "echo third"];

/// the output for each command.
/// </para>
/// </remarks>
public static AIFunction AsAIFunction(this ShellTool shellTool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ShellTool not inheriting from AIFunction if we want to have it as an AIFunction anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, see comment: #3369 (comment)

// BlockedPaths = ["/etc", "/var"],

// Only allow specific commands (regex patterns supported)
AllowedCommands = ["^ls", "^dir", "^echo", "^cat", "^type", "^mkdir", "^pwd", "^cd"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we vary these by operating system as well? Some of these don't work in the windows command prompt shell.
Also, commands vary by shell type, so on windows powershell supports ls and pwd, but command prompt does not.

/// <summary>
/// Raw output from shell executor.
/// </summary>
public sealed class ShellExecutorOutput
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider ShellExecutorResponse to match AgentRunResponse, and ChatResponse.

Comment on lines +187 to +190
return ("cmd.exe", $"/c {command}");
}

return ("/bin/sh", $"-c \"{command.Replace("\"", "\\\"")}\"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only want to support cmd and sh, or should we make this configurable?

/// Represents the result of a shell command execution.
/// </summary>
[DebuggerDisplay("{DebuggerDisplay,nq}")]
public sealed class ShellResultContent : AIContent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be provided to the LLM as FunctionResultContent? What's the idea with inheriting from AIContent?
Similar question with ShellCallContent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in current example, we convert ShellTool to AIFunction, just so it can be used with any provider that supports function calls. But function calling is not the only case.

There are providers that support shell tools out of the box, like OpenAI Responses Client:
https://platform.openai.com/docs/guides/tools-shell

They provide executor as example only, but they do have "type": "shell_call" and "type": "shell_call_output" message types, so the user can decide how exactly they want to process this.

That's the primary reason for separate ShellCallContent and ShellResultContent - for the ability to work independently from function calls.

Copy link
Member

@stephentoub stephentoub Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent then that ShellCallContent/ShellResultContent will be used to represent the server-side shell calls? That translation would best be done in the IChatClient for responses. We should really look at pushing down these parts (at least the call and result content) of the abstractions, along with a HostedShellTool, into MEAI.

Also, it feels like for such a case the local execution would be handled by a piece of middleware akin to FunctionInvokingChatClient, but a ShellCommandInvokingChatClient, which would look for ShellCallContent and produce ShellResultContent. Or FunctionInvokingChatClient could be taught to do that.

cc: @jozkee

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent then that ShellCallContent/ShellResultContent will be used to represent the server-side shell calls? That translation would best be done in the IChatClient for responses. We should really look at pushing down these parts (at least the call and result content) of the abstractions, along with a HostedShellTool, into MEAI.

Correct, and the idea of adding this to MEAI was my initial plan :) I just added it here first, because I want to test it end-to-end, just to understand if there is anything missing.

private readonly IReadOnlyList<Regex>? _compiledAllowedPatterns;
private readonly IReadOnlyList<Regex>? _compiledDeniedPatterns;

private static readonly string[] s_privilegeEscalationCommands =
Copy link
Contributor

@westey-m westey-m Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be configurable in addition to having defaults? Same for shellWrapperCOmmands and dangerous patterns

"pkexec"
];

private static readonly string[] s_shellWrapperCommands =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have powershell and pswh?

/// Gets or sets the working directory for command execution.
/// When null, uses the current working directory.
/// </summary>
public string? WorkingDirectory { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be on the safe side, I wonder if this property should be mandatory so that users must explicitly provide a working directory, rather than defaulting to Environment.CurrentDirectory. This would ensure users consider where the code is executed and reduce potential security risks.

/// Represents a shell command execution request.
/// </summary>
[DebuggerDisplay("{DebuggerDisplay,nq}")]
public sealed class ShellCallContent : AIContent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we unsealed FCC/FRC and made this extend that, I think FunctionInvokingChatClient as-is could be able to handle these shell calls.

Alternatively, I think these shell contents could be completely written in terms of FCC/FRC but it may be too loose i.e. some properties would need to travel in AdditionalProperties.

/// Represents a shell command execution request.
/// </summary>
[DebuggerDisplay("{DebuggerDisplay,nq}")]
public sealed class ShellCallContent : AIContent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom AIContent needs to be added into the AgentsAbstractionJsonUtilities' options so that it correctly participates in polymorphic serialization.

/// Represents the result of a shell command execution.
/// </summary>
[DebuggerDisplay("{DebuggerDisplay,nq}")]
public sealed class ShellResultContent : AIContent
Copy link
Member

@stephentoub stephentoub Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent then that ShellCallContent/ShellResultContent will be used to represent the server-side shell calls? That translation would best be done in the IChatClient for responses. We should really look at pushing down these parts (at least the call and result content) of the abstractions, along with a HostedShellTool, into MEAI.

Also, it feels like for such a case the local execution would be handled by a piece of middleware akin to FunctionInvokingChatClient, but a ShellCommandInvokingChatClient, which would look for ShellCallContent and produce ShellResultContent. Or FunctionInvokingChatClient could be taught to do that.

cc: @jozkee

/// this tool to an <see cref="AIFunction"/> for use with AI agents.
/// </para>
/// </remarks>
public class ShellTool : AITool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is invokable, why doesn't it inherit AIFunction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep it separately, because I'm not sure how well it will work with AIFunction-related classes like AIFunctionArguments and FunctionInvokingChatClient.

My thinking was that if OpenAI Responses API defines function tools and shell tools as separate entities, then it's probably a good idea to keep it in the same way on abstraction level.

But I'm looking for the best guidance here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's value in making this extend AIFunction since it would be needed for handling shell tool calls in FunctionInvokingChatClient.
https://github.com/dotnet/extensions/blob/ab5bdf603e04a6bbef6366cc93e6e4c5a89d29a3/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs#L854-L855
Unless there's a reason to not want shell tool calls handled there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unclear on the purpose of ShellTool as it's written. It doesn't derive from AIFunction so it's not invokable as an AITool, but it has invocation APIs so it doesn't seem like it's a marker tool to be recognized by leaf IChatClients and translated into the right underlying tool for that particular API (OpenAI Responses's shell tool, Anthropic's bash tool, etc.)

"runas",
"doas",
"pkexec"
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a threat model for all of this?
cc: @GrabYourPitchforks

// Overwrite disk
new Regex(@">\s*/dev/sd", RegexOptions.Compiled, TimeSpan.FromSeconds(1)),
// chmod 777 /
new Regex(@"chmod\s+(-[rR]\s+)?777\s+/", RegexOptions.Compiled, TimeSpan.FromSeconds(1)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're concerned about such dangerous commands, I'm concerned this list is insufficient.

How are we thinking about having this vs having folks treat such invocations as requiring of user approval?

(Regardless, shouldn't "dangerous patterns" be configurable?)

/// Gets the description of the tool.
/// </summary>
public override string Description =>
"Execute shell commands. Returns stdout, stderr, and exit code for each command.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the description provided to AI, it should be more descriptive / prescriptive, e.g. about what's supported and what's not supported, what language to use, etc. Is this bash? powershell? etc.?

return false;
}

private static bool ContainsPrivilegeEscalation(string command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're relying on this for security, this needs strenuous review.
cc: @GrabYourPitchforks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure that it will be possible to cover all cases in this PR. The idea here is to add initial set of security validations with a notice that this is recommended to be used only in sandboxed environments with limited access.

@dmytrostruk dmytrostruk marked this pull request as draft January 23, 2026 02:01
@stephentoub
Copy link
Member

I thought about this a bit today, and here's what I think might make sense:

  • ShellTool : AIFunction. The tool would itself advertize a schema and description that would allow for arbitrary commands to be handed to it as an argument, such that arbitrary IChatClient implementations would "just work" with it as a shell tool. However, IChatClient implementations backed by a service that has its own notion of a shell tool could instead special-case it (as they special case all the other HostedXxTool types, translating it into usage of the relevant service tool.
  • ShellCallContent : FunctionCallContent. The FCC would be for the ShellTool, with the derived type having all the specific information relevant to a service's specialized shell invocation. An IChatClient that doesn't have a special shell tool would just produce the base FunctionCallContent like it would for any other tool. An IChatClient with a special shell tool would instead construct the derived ShellCallContent. When ShellTool is invoked, it would look at the FunctionCallContent; if it's of the derived type, it'd do the more specialized thing, and if it's the base, it would just proceed as would any other AIFunction.
  • ShellResultContent : FunctionResultContent. When doing the special thing, ShellTool's InvokeAsync override would return a ShellResultContent rather than just returning the textual results from invoking the shell tool. We'd tweak the handling of AIFunction in FunctionInvokingChatClient so that an AIFunction returning a FunctionResultContent as the result would just use that FRC rather than creating a new one to wrap the result. That's a useful addition in general, as it gives a function more control over the FunctionResultContent instance that's created, and in this case it means the tool can flow the additional data back strongly-typed. The IChatClient that gets a ShellResultContent would translate that back to the appropriate shell result content expected by the service.

@dmytrostruk, @jozkee, thoughts? If you agree this makes sense, something we could prototype quickly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation .NET

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants