-
Notifications
You must be signed in to change notification settings - Fork 1.1k
.NET: Added ShellTool and LocalShellExecutor #3369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds shell command execution capabilities to the Agent Framework through a new ShellTool abstraction and LocalShellExecutor implementation. The implementation includes comprehensive security controls for privilege escalation blocking, dangerous pattern detection, path validation, and configurable allow/deny lists.
Changes:
- Added
ShellToolabstraction with security policies (Microsoft.Agents.AI.Abstractions) - Added
LocalShellExecutorimplementation for local shell execution (Microsoft.Agents.AI.Shell.Local) - Added comprehensive unit tests for ShellTool security validation and integration tests for LocalShellExecutor
- Added sample demonstrating ShellTool usage with human-in-the-loop approval (Agent_Step21_ShellTool)
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| Microsoft.Agents.AI.Shell.Local/LocalShellExecutor.cs | Implements local shell command execution with timeout, output truncation, and cross-platform support |
| Microsoft.Agents.AI.Shell.Local/Microsoft.Agents.AI.Shell.Local.csproj | Project configuration for the LocalShellExecutor implementation |
| Microsoft.Agents.AI.Abstractions/Shell/*.cs | Core abstractions including ShellTool, ShellExecutor, ShellToolOptions, and content types |
| Microsoft.Agents.AI.Shell.Local.IntegrationTests/*.cs | Integration tests for LocalShellExecutor covering various execution scenarios |
| Microsoft.Agents.AI.Abstractions.UnitTests/Shell/*.cs | Unit tests for ShellTool security validation and options configuration |
| Agent_Step21_ShellTool/* | Sample demonstrating ShellTool usage with security configuration and approval workflow |
| agent-framework-dotnet.slnx | Solution file updated to include new projects |
| README.md | Updated to reference the new ShellTool sample |
Comments suppressed due to low confidence (1)
dotnet/tests/Microsoft.Agents.AI.Shell.Local.IntegrationTests/LocalShellExecutorTests.cs:89
- This condition is redundant because both branches assign the same array of commands. This appears to be a copy-paste error where the Windows and Unix commands were supposed to be different but ended up being identical.
string[] commands = RuntimeInformation.IsOSPlatform(OSPlatform.Windows)
? ["echo first", "echo second", "echo third"]
: ["echo first", "echo second", "echo third"];
dotnet/tests/Microsoft.Agents.AI.Shell.Local.IntegrationTests/LocalShellExecutorTests.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Shell.Local/LocalShellExecutor.cs
Outdated
Show resolved
Hide resolved
...ents.AI.Shell.Local.IntegrationTests/Microsoft.Agents.AI.Shell.Local.IntegrationTests.csproj
Show resolved
Hide resolved
| /// the output for each command. | ||
| /// </para> | ||
| /// </remarks> | ||
| public static AIFunction AsAIFunction(this ShellTool shellTool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is ShellTool not inheriting from AIFunction if we want to have it as an AIFunction anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, see comment: #3369 (comment)
| // BlockedPaths = ["/etc", "/var"], | ||
|
|
||
| // Only allow specific commands (regex patterns supported) | ||
| AllowedCommands = ["^ls", "^dir", "^echo", "^cat", "^type", "^mkdir", "^pwd", "^cd"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we vary these by operating system as well? Some of these don't work in the windows command prompt shell.
Also, commands vary by shell type, so on windows powershell supports ls and pwd, but command prompt does not.
| /// <summary> | ||
| /// Raw output from shell executor. | ||
| /// </summary> | ||
| public sealed class ShellExecutorOutput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also consider ShellExecutorResponse to match AgentRunResponse, and ChatResponse.
| return ("cmd.exe", $"/c {command}"); | ||
| } | ||
|
|
||
| return ("/bin/sh", $"-c \"{command.Replace("\"", "\\\"")}\""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only want to support cmd and sh, or should we make this configurable?
| /// Represents the result of a shell command execution. | ||
| /// </summary> | ||
| [DebuggerDisplay("{DebuggerDisplay,nq}")] | ||
| public sealed class ShellResultContent : AIContent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this be provided to the LLM as FunctionResultContent? What's the idea with inheriting from AIContent?
Similar question with ShellCallContent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in current example, we convert ShellTool to AIFunction, just so it can be used with any provider that supports function calls. But function calling is not the only case.
There are providers that support shell tools out of the box, like OpenAI Responses Client:
https://platform.openai.com/docs/guides/tools-shell
They provide executor as example only, but they do have "type": "shell_call" and "type": "shell_call_output" message types, so the user can decide how exactly they want to process this.
That's the primary reason for separate ShellCallContent and ShellResultContent - for the ability to work independently from function calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent then that ShellCallContent/ShellResultContent will be used to represent the server-side shell calls? That translation would best be done in the IChatClient for responses. We should really look at pushing down these parts (at least the call and result content) of the abstractions, along with a HostedShellTool, into MEAI.
Also, it feels like for such a case the local execution would be handled by a piece of middleware akin to FunctionInvokingChatClient, but a ShellCommandInvokingChatClient, which would look for ShellCallContent and produce ShellResultContent. Or FunctionInvokingChatClient could be taught to do that.
cc: @jozkee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent then that ShellCallContent/ShellResultContent will be used to represent the server-side shell calls? That translation would best be done in the IChatClient for responses. We should really look at pushing down these parts (at least the call and result content) of the abstractions, along with a HostedShellTool, into MEAI.
Correct, and the idea of adding this to MEAI was my initial plan :) I just added it here first, because I want to test it end-to-end, just to understand if there is anything missing.
| private readonly IReadOnlyList<Regex>? _compiledAllowedPatterns; | ||
| private readonly IReadOnlyList<Regex>? _compiledDeniedPatterns; | ||
|
|
||
| private static readonly string[] s_privilegeEscalationCommands = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be configurable in addition to having defaults? Same for shellWrapperCOmmands and dangerous patterns
| "pkexec" | ||
| ]; | ||
|
|
||
| private static readonly string[] s_shellWrapperCommands = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have powershell and pswh?
| /// Gets or sets the working directory for command execution. | ||
| /// When null, uses the current working directory. | ||
| /// </summary> | ||
| public string? WorkingDirectory { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be on the safe side, I wonder if this property should be mandatory so that users must explicitly provide a working directory, rather than defaulting to Environment.CurrentDirectory. This would ensure users consider where the code is executed and reduce potential security risks.
| /// Represents a shell command execution request. | ||
| /// </summary> | ||
| [DebuggerDisplay("{DebuggerDisplay,nq}")] | ||
| public sealed class ShellCallContent : AIContent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we unsealed FCC/FRC and made this extend that, I think FunctionInvokingChatClient as-is could be able to handle these shell calls.
Alternatively, I think these shell contents could be completely written in terms of FCC/FRC but it may be too loose i.e. some properties would need to travel in AdditionalProperties.
| /// Represents a shell command execution request. | ||
| /// </summary> | ||
| [DebuggerDisplay("{DebuggerDisplay,nq}")] | ||
| public sealed class ShellCallContent : AIContent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom AIContent needs to be added into the AgentsAbstractionJsonUtilities' options so that it correctly participates in polymorphic serialization.
| /// Represents the result of a shell command execution. | ||
| /// </summary> | ||
| [DebuggerDisplay("{DebuggerDisplay,nq}")] | ||
| public sealed class ShellResultContent : AIContent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent then that ShellCallContent/ShellResultContent will be used to represent the server-side shell calls? That translation would best be done in the IChatClient for responses. We should really look at pushing down these parts (at least the call and result content) of the abstractions, along with a HostedShellTool, into MEAI.
Also, it feels like for such a case the local execution would be handled by a piece of middleware akin to FunctionInvokingChatClient, but a ShellCommandInvokingChatClient, which would look for ShellCallContent and produce ShellResultContent. Or FunctionInvokingChatClient could be taught to do that.
cc: @jozkee
| /// this tool to an <see cref="AIFunction"/> for use with AI agents. | ||
| /// </para> | ||
| /// </remarks> | ||
| public class ShellTool : AITool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is invokable, why doesn't it inherit AIFunction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to keep it separately, because I'm not sure how well it will work with AIFunction-related classes like AIFunctionArguments and FunctionInvokingChatClient.
My thinking was that if OpenAI Responses API defines function tools and shell tools as separate entities, then it's probably a good idea to keep it in the same way on abstraction level.
But I'm looking for the best guidance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's value in making this extend AIFunction since it would be needed for handling shell tool calls in FunctionInvokingChatClient.
https://github.com/dotnet/extensions/blob/ab5bdf603e04a6bbef6366cc93e6e4c5a89d29a3/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs#L854-L855
Unless there's a reason to not want shell tool calls handled there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still unclear on the purpose of ShellTool as it's written. It doesn't derive from AIFunction so it's not invokable as an AITool, but it has invocation APIs so it doesn't seem like it's a marker tool to be recognized by leaf IChatClients and translated into the right underlying tool for that particular API (OpenAI Responses's shell tool, Anthropic's bash tool, etc.)
| "runas", | ||
| "doas", | ||
| "pkexec" | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a threat model for all of this?
cc: @GrabYourPitchforks
| // Overwrite disk | ||
| new Regex(@">\s*/dev/sd", RegexOptions.Compiled, TimeSpan.FromSeconds(1)), | ||
| // chmod 777 / | ||
| new Regex(@"chmod\s+(-[rR]\s+)?777\s+/", RegexOptions.Compiled, TimeSpan.FromSeconds(1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're concerned about such dangerous commands, I'm concerned this list is insufficient.
How are we thinking about having this vs having folks treat such invocations as requiring of user approval?
(Regardless, shouldn't "dangerous patterns" be configurable?)
| /// Gets the description of the tool. | ||
| /// </summary> | ||
| public override string Description => | ||
| "Execute shell commands. Returns stdout, stderr, and exit code for each command."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the description provided to AI, it should be more descriptive / prescriptive, e.g. about what's supported and what's not supported, what language to use, etc. Is this bash? powershell? etc.?
| return false; | ||
| } | ||
|
|
||
| private static bool ContainsPrivilegeEscalation(string command) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're relying on this for security, this needs strenuous review.
cc: @GrabYourPitchforks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not sure that it will be possible to cover all cases in this PR. The idea here is to add initial set of security validations with a notice that this is recommended to be used only in sandboxed environments with limited access.
|
I thought about this a bit today, and here's what I think might make sense:
@dmytrostruk, @jozkee, thoughts? If you agree this makes sense, something we could prototype quickly? |
Motivation and Context
This PR adds shell command execution capabilities to the Agent Framework through a new
ShellToolabstraction andLocalShellExecutorimplementation.ShellToolabstraction for shell command execution with configurable security policies (Microsoft.Agents.AI.Abstractions)LocalShellExecutor(Microsoft.Agents.AI.Shell.Local)AsAIFunctionconversion for use with existing AI agentsContribution Checklist