diff --git a/docs/decisions/00NN-userapproval-content-types.md b/docs/decisions/00NN-userapproval-content-types.md new file mode 100644 index 0000000000..800f4755ed --- /dev/null +++ b/docs/decisions/00NN-userapproval-content-types.md @@ -0,0 +1,522 @@ +--- +# These are optional elements. Feel free to remove any of them. +status: proposed +contact: westey-m +date: 2025-07-16 {YYYY-MM-DD when the decision was last updated} +deciders: sergeymenshykh, markwallace-microsoft, rogerbarreto, dmytrostruk, westey-m, eavanvalkenburg, stephentoub, peterychang +consulted: +informed: +--- + +# Agent User Approvals Content Types Design + +## Context and Problem Statement + +When agents are operating on behalf of a user, there may be cases where the agent requires user approval to continue an operation. +This is complicated by the fact that an agent may be remote and the user may not immediately be available to provide the approval. + +Inference services are also increasingly supporting built-in tools or service side MCP invocation, which may require user approval before the tool can be invoked. + +This document aims to provide options and capture the decision on how to model this user approval interaction with the agent caller. + +See various features that would need to be supported via this type of mechanism, plus how various other frameworks support this: + +- Also see [dotnet issue 6492](https://github.com/dotnet/extensions/issues/6492), which discusses the need for a similar pattern in the context of MCP approvals. +- Also see [the openai RunToolApprovalItem](https://openai.github.io/openai-agents-js/openai/agents/classes/runtoolapprovalitem/). +- Also see [the openai human-in-the-loop guide](https://openai.github.io/openai-agents-js/guides/human-in-the-loop/#approval-requests). +- Also see [the openai MCP guide](https://openai.github.io/openai-agents-js/guides/mcp/#optional-approval-flow). +- Also see [MCP Approval Requests from OpenAI](https://platform.openai.com/docs/guides/tools-remote-mcp#approvals). +- Also see [Azure AI Foundry MCP Approvals](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/model-context-protocol-samples?pivots=rest#submit-your-approval). +- Also see [MCP Elicitation requests](https://modelcontextprotocol.io/specification/draft/client/elicitation) + +## Decision Drivers + +- Agents should encapsulate their internal logic and not leak it to the caller. +- We need to support approvals for local actions as well as remote actions. +- We need to support approvals for service-side tool use, such as remote MCP tool invocations +- We should consider how other user input requests will be modeled, so that we can have a consistent approach for user input requests and approvals. + +## Considered Options + +### 1. Return a FunctionCallContent to the agent caller, that it executes + +This introduces a manual function calling element to agents, where the caller of the agent is expected to invoke the function if the user approves it. + +This approach is problematic for a number of reasons: + +- This may not work for remote agents (e.g. via A2A), where the function that the agent wants to call does not reside on the caller's machine. +- The main value prop of an agent is to encapsulate the internal logic of the agent, but this leaks that logic to the caller, requiring the caller to know how to invoke the agent's function calls. +- Inference services are introducing their own approval content types for server side tool or function invocation, and will not be addressed by this approach. + +### 2. Introduce an ApprovalCallback in AgentRunOptions and ChatOptions + +This approach allows a caller to provide a callback that the agent can invoke when it requires user approval. + +This approach is easy to use when the user and agent are in the same application context, such as a desktop application, where the application can show the approval request to the user and get their response from the callback before continuing the agent run. + +This approach does not work well for cases where the agent is hosted in a remote service, and where there is no user available to provide the approval in the same application context. +For cases like this, the agent needs to be suspended, and a network response must be sent to the client app. After the user provides their approval, the client app must call the service that hosts the agent again, with the user's decision, and the agent needs to be resumed. However, with a callback, the agent is deep in the call stack and cannot be suspended or resumed like this. + +```csharp +class AgentRunOptions +{ + public Func>? ApprovalCallback { get; set; } +} + +agent.RunAsync("Please book me a flight for Friday to Paris.", thread, new AgentRunOptions +{ + ApprovalCallback = async (approvalRequest) => + { + // Show the approval request to the user in the appropriate format. + // The user can then approve or reject the request. + // The optional FunctionCallContent can be used to show the user what function the agent wants to call with the parameter set: + // approvalRequest.FunctionCall?.Arguments. + + // If the user approves: + return true; + } +}); +``` + +### 3. Introduce new ApprovalRequestContent and ApprovalResponseContent types + +The agent would return an `ApprovalRequestContent` to the caller, which would then be responsible for getting approval from the user in whatever way is appropriate for the application. +The caller would then invoke the agent again with an `ApprovalResponseContent` to the agent containing the user decision. + +When an agent returns an `ApprovalRequestContent`, the run is finished for the time being, and to continue, the agent must be invoked again with an `ApprovalResponseContent` on the same thread as the original request. This doesn't of course have to be the exact same thread object, but it should have the equivalent contents as the original thread, since the agent would have stored the `ApprovalRequestContent` in its thread state. + +The `ApprovalRequestContent` could contain an optional `FunctionCallContent` if the approval is for a function call, along with any additional information that the agent wants to provide to the user to help them make a decision. + +It is up to the agent to decide when and if a user approval is required, and therefore when to return an `ApprovalRequestContent`. + +`ApprovalRequestContent` and `ApprovalResponseContent` will not necessarily always map to a supported content type for the underlying service or agent thread storage. +Specifically, when we are deciding in the IChatClient stack to ask for approval from the user, for a function call, this does not mean that the underlying ai service or +service side thread type (where applicable) supports the concept of a function call approval request. While we can store the approval requests and response in local +threads, service managed threads won't necessarily support this. For service managed threads, there will therefore be no long term record of the approval request in the chat history. +We should however log approvals so that there is a trace of this for debugging and auditing purposes. + +Suggested Types: + +```csharp +class ApprovalRequestContent : AIContent +{ + // An ID to uniquely identify the approval request/response pair. + public string Id { get; set; } + + // An optional user targeted message to explain what needs to be approved. + public string? Text { get; set; } + + // Optional: If the approval is for a function call, this will contain the function call content. + public FunctionCallContent? FunctionCall { get; set; } + + public ApprovalResponseContent CreateApproval() + { + return new ApprovalResponseContent + { + ApprovalId = this.ApprovalId, + Approved = true, + FunctionCall = this.FunctionCall + }; + } + + public ApprovalResponseContent CreateRejection() + { + return new ApprovalResponseContent + { + ApprovalId = this.ApprovalId, + Approved = false, + FunctionCall = this.FunctionCall + }; + } +} + +class ApprovalResponseContent : AIContent +{ + // An ID to uniquely identify the approval request/response pair. + public string Id { get; set; } + + // Indicates whether the user approved the request. + public bool Approved { get; set; } + + // Optional: If the approval is for a function call, this will contain the function call content. + public FunctionCallContent? FunctionCall { get; set; } +} + +var response = await agent.RunAsync("Please book me a flight for Friday to Paris.", thread); +while (response.ApprovalRequests.Count > 0) +{ + List messages = new List(); + foreach (var approvalRequest in response.ApprovalRequests) + { + // Show the approval request to the user in the appropriate format. + // The user can then approve or reject the request. + // The optional FunctionCallContent can be used to show the user what function the agent wants to call with the parameter set: + // approvalRequest.FunctionCall?.Arguments. + // The Text property of the ApprovalRequestContent can also be used to show the user any additional textual context about the request. + + // If the user approves: + messages.Add(new ChatMessage(ChatRole.User, [approvalRequest.CreateApproval()])); + } + + // Get the next response from the agent. + response = await agent.RunAsync(messages, thread); +} + +class AgentRunResponse +{ + ... + + // A new property on AgentRunResponse to aggregate the ApprovalRequestContent items from + // the response messages (Similar to the Text property). + public IEnumerable ApprovalRequests { get; set; } + + ... +} +``` + +### 4. Introduce new Container UserInputRequestContent and UserInputResponseContent types + +This approach is similar to the `ApprovalRequestContent` and `ApprovalResponseContent` types, but is more generic and can be used for any type of user input request, not just approvals. + +There is some ambiguity with this approach. When using an LLM based agent the LLM may return a text response about missing user input. +E.g the LLM may need to invoke a function but the user did not supply all necessary information to fill out all arguments. +Typically an LLM would just respond with a text message asking the user for the missing information. +In this case, the message is not distinguishable from any other result message, and therefore cannot be returned to the caller as a `UserInputRequestContent`, even though it is conceptually a type of unstructured user input request. Ultimately our types are modeled to make it easy for callers to decide on the right way to represent this to users. E.g. is it just a regular message to show to users, or do we need a special UX for it. + +Suggested Types: + +```csharp +class UserInputRequestContent : AIContent +{ + // An ID to uniquely identify the approval request/response pair. + public string ApprovalId { get; set; } + + // DecisionTarget could contain: + // FunctionCallContent: The function call that the agent wants to invoke. + // TextContent: Text that describes the question for that the user should answer. + object? DecisionTarget { get; set; } // Anything else the user may need to make a decision about. + + // Possible InputFormat subclasses: + // SchemaInputFormat: Contains a schema for the user input. + // ApprovalInputFormat: Indicates that the user needs to approve something. + // FreeformTextInputFormat: Indicates that the user can provide freeform text input. + // Other formats can be added as needed, e.g. cards when using activity protocol. + public InputFormat InputFormat { get; set; } // How the user should provide input (e.g., form, options, etc.). +} + +class UserInputResponseContent : AIContent +{ + // An ID to uniquely identify the approval request/response pair. + public string ApprovalId { get; set; } + + // Possible UserInputResult subclasses: + // SchemaInputResult: Contains the structured data provided by the user. + // ApprovalResult: Contains a bool with approved / rejected. + // FreeformTextResult: Contains the freeform text input provided by the user. + public UserInputResult Result { get; set; } // The user input. + + public object? DecisionTarget { get; set; } // A copy of the DecisionTarget from the UserInputRequestContent, if applicable. +} + +var response = await agent.RunAsync("Please book me a flight for Friday to Paris.", thread); +while (response.UserInputRequests.Any()) +{ + List messages = new List(); + foreach (var userInputRequest in response.UserInputRequests) + { + // Show the user input request to the user in the appropriate format. + // The DecisionTarget can be used to show the user what function the agent wants to call with the parameter set. + // The InputFormat property can be used to determine the type of UX when allowing users to provide input. + + if (userInputRequest.InputFormat is ApprovalInputFormat approvalInputFormat) + { + // Here we need to show the user an approval request. + // We can use the DecisionTarget to show e.g. the function call that the agent wants to invoke. + // The user can then approve or reject the request. + + // If the user approves: + var approvalMessage = new ChatMessage(ChatRole.User, new UserInputResponseContent { + ApprovalId = userInputRequest.ApprovalId, + Result = new ApprovalResult { Approved = true }, + DecisionTarget = userInputRequest.DecisionTarget + }); + messages.Add(approvalMessage); + } + else + { + throw new NotSupportedException("Unsupported InputFormat type."); + } + } + + // Get the next response from the agent. + response = await agent.RunAsync(messages, thread); +} + +class AgentRunResponse +{ + ... + + // A new property on AgentRunResponse to aggregate the UserInputRequestContent items from + // the response messages (Similar to the Text property). + public IReadOnlyList UserInputRequests { get; set; } + + ... +} +``` + +### 5. Introduce new Base UserInputRequestContent and UserInputResponseContent types + +This approach is similar to option 4, but the `UserInputRequestContent` and `UserInputResponseContent` types are base classes rather than generic container types. + +Suggested Types: + +```csharp +class UserInputRequestContent : AIContent +{ + // An ID to uniquely identify the approval request/response pair. + public string Id { get; set; } +} + +class UserInputResponseContent : AIContent +{ + // An ID to uniquely identify the approval request/response pair. + public string Id { get; set; } +} + +// ----------------------------------- +// Used for approving a function call. +class FunctionApprovalRequestContent : UserInputRequestContent +{ + // Contains the function call that the agent wants to invoke. + public FunctionCallContent FunctionCall { get; set; } + + public ApprovalResponseContent CreateApproval() + { + return new ApprovalResponseContent + { + ApprovalId = this.ApprovalId, + Approved = true, + FunctionCall = this.FunctionCall + }; + } + + public ApprovalResponseContent CreateRejection() + { + return new ApprovalResponseContent + { + ApprovalId = this.ApprovalId, + Approved = false, + FunctionCall = this.FunctionCall + }; + } +} +class FunctionApprovalResponseContent : UserInputResponseContent +{ + // Indicates whether the user approved the request. + public bool Approved { get; set; } + + // Contains the function call that the agent wants to invoke. + public FunctionCallContent FunctionCall { get; set; } +} + +// -------------------------------------------------- +// Used for approving a request described using text. +class TextApprovalRequestContent : UserInputRequestContent +{ + // A user targeted message to explain what needs to be approved. + public string Text { get; set; } +} +class TextApprovalResponseContent : UserInputResponseContent +{ + // Indicates whether the user approved the request. + public bool Approved { get; set; } +} + +// ------------------------------------------------ +// Used for providing input in a structured format. +class StructuredDataInputRequestContent : UserInputRequestContent +{ + // A user targeted message to explain what is being requested. + public string? Text { get; set; } + + // Contains the schema for the user input. + public JsonElement Schema { get; set; } +} +class StructuredDataInputResponseContent : UserInputResponseContent +{ + // Contains the structured data provided by the user. + public JsonElement StructuredData { get; set; } +} + +var response = await agent.RunAsync("Please book me a flight for Friday to Paris.", thread); +while (response.UserInputRequests.Any()) +{ + List messages = new List(); + foreach (var userInputRequest in response.UserInputRequests) + { + if (userInputRequest is FunctionApprovalRequestContent approvalRequest) + { + // Here we need to show the user an approval request. + // We can use the FunctionCall property to show e.g. the function call that the agent wants to invoke. + // If the user approves: + messages.Add(new ChatMessage(ChatRole.User, approvalRequest.CreateApproval())); + } + } + + // Get the next response from the agent. + response = await agent.RunAsync(messages, thread); +} + +class AgentRunResponse +{ + ... + + // A new property on AgentRunResponse to aggregate the UserInputRequestContent items from + // the response messages (Similar to the Text property). + public IEnumerable UserInputRequests { get; set; } + + ... +} +``` + +## Decision Outcome + +Chosen option 5. + +## Appendices + +### ChatClientAgent Approval Process Flow + +1. User passes a User message to the agent with a request. +1. Agent calls IChatClient with any functions registered on the agent. + (IChatClient has FunctionInvokingChatClient) +1. Model responds with FunctionCallContent indicating function calls required. +1. FunctionInvokingChatClient decorator identifies any function calls that require user approval and returns an FunctionApprovalRequestContent. + (If there are multiple parallel function calls, all function calls will be returned as FunctionApprovalRequestContent even if only some require approval.) +1. Agent updates the thread with the FunctionApprovalRequestContent (or this may have already been done by a service threaded agent). +1. Agent returns the FunctionApprovalRequestContent to the caller which shows it to the user in the appropriate format. +1. User (via caller) invokes the agent again with FunctionApprovalResponseContent. +1. Agent adds the FunctionApprovalResponseContent to the thread. +1. Agent calls IChatClient with the provided FunctionApprovalResponseContent. +1. Agent invokes IChatClient with FunctionApprovalResponseContent and the FunctionInvokingChatClient decorator identifies the response as an approval for the function call. + Any rejected approvals are converted to FunctionResultContent with a message indicating that the function invocation was denied. + Any approved approvals are executed by the FunctionInvokingChatClient decorator. +1. FunctionInvokingChatClient decorator passes the FunctionCallContent and FunctionResultContent for the approved and rejected function calls to the model. +1. Model responds with the result. +1. FunctionInvokingChatClient returns the FunctionCallContent, FunctionResultContent, and the result message to the agent. +1. Agent responds to caller with the same messages and updates the thread with these as well. + +### CustomAgent Approval Process Flow + +1. User passes a User message to the agent with a request. +1. Agent adds this message to the thread. +1. Agent executes various steps. +1. Agent encounters a step for which it requires user input to continue. +1. Agent responds with an UserInputRequestContent and also adds it to its thread. +1. User (via caller) invokes the agent again with UserInputResponseContent. +1. Agent adds the UserInputResponseContent to the thread. +1. Agent responds to caller with result message and thread is updated with the result message. + +### Sequence Diagram: FunctionInvokingChatClient with built in Approval Generation + +This is a ChatClient Approval Stack option has been proven to work via a proof of concept implementation. + +```mermaid +--- +title: Multiple Functions with partial approval +--- + +sequenceDiagram + note right of Developer: Developer asks question with two functions. + Developer->>+FunctionInvokingChatClient: What is the special soup today?
[GetMenu, GetSpecials] + FunctionInvokingChatClient->>+ResponseChatClient: What is the special soup today?
[GetMenu, GetSpecials] + + ResponseChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)],
[FunctionCallContent(GetSpecials)] + note right of FunctionInvokingChatClient: FICC turns FunctionCallContent
into FunctionApprovalRequestContent + FunctionInvokingChatClient->>+Developer: [FunctionApprovalRequestContent(GetMenu)]
[FunctionApprovalRequestContent(GetSpecials)] + + note right of Developer:Developer asks user for approval + Developer->>+FunctionInvokingChatClient: [FunctionApprovalRequestContent(GetMenu, approved=false)]
[FunctionApprovalRequestContent(GetSpecials, approved=true)] + note right of FunctionInvokingChatClient:FunctionInvokingChatClient executes the approved
function and generates a failed FunctionResultContent
for the rejected one, before invoking the model again. + FunctionInvokingChatClient->>+ResponseChatClient: What is the special soup today?
[FunctionCallContent(GetMenu)],
[FunctionCallContent(GetSpecials)],
[FunctionResultContent(GetMenu, Function invocation denied")]
[FunctionResultContent(GetSpecials, "Special Soup: Clam Chouder...")] + + ResponseChatClient-->>-FunctionInvokingChatClient: [TextContent("The specials soup is...")] + FunctionInvokingChatClient->>+Developer: [FunctionCallContent(GetMenu)],
[FunctionCallContent(GetSpecials)],
[FunctionResultContent(GetMenu, Function invocation denied")]
[FunctionResultContent(GetSpecials, "Special Soup: Clam Chouder...")]
[TextContent("The specials soup is...")] +``` + +### Sequence Diagram: Post FunctionInvokingChatClient ApprovalGeneratingChatClient - Multiple function calls with partial approval + +This is a discarded ChatClient Approval Stack option, but is included here for reference. + +```mermaid +--- +title: Multiple Functions with partial approval +--- + +sequenceDiagram + note right of Developer: Developer asks question with two functions. + Developer->>+FunctionInvokingChatClient: What is the special soup today? [GetMenu, GetSpecials] + FunctionInvokingChatClient->>+ApprovalGeneratingChatClient: What is the special soup today? [GetMenu, GetSpecials] + ApprovalGeneratingChatClient->>+ResponseChatClient: What is the special soup today? [GetMenu, GetSpecials] + + ResponseChatClient-->>-ApprovalGeneratingChatClient: [FunctionCallContent(GetMenu)],
[FunctionCallContent(GetSpecials)] + ApprovalGeneratingChatClient-->>-FunctionInvokingChatClient: [FunctionApprovalRequestContent(GetMenu)],
[FunctionApprovalRequestContent(GetSpecials)] + FunctionInvokingChatClient-->>-Developer: [FunctionApprovalRequestContent(GetMenu)]
[FunctionApprovalRequestContent(GetSpecials)] + + note right of Developer: Developer approves one function call and rejects the other. + Developer->>+FunctionInvokingChatClient: [FunctionApprovalResponseContent(GetMenu, approved=true)]
[FunctionApprovalResponseContent(GetSpecials, approved=false)] + FunctionInvokingChatClient->>+ApprovalGeneratingChatClient: [FunctionApprovalResponseContent(GetMenu, approved=true)]
[FunctionApprovalResponseContent(GetSpecials, approved=false)] + + note right of FunctionInvokingChatClient: ApprovalGeneratingChatClient only returns FunctionCallContent
for approved FunctionApprovalResponseContent. + ApprovalGeneratingChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)] + note right of FunctionInvokingChatClient: FunctionInvokingChatClient has to also include all
FunctionApprovalResponseContent in the new downstream request. + FunctionInvokingChatClient->>+ApprovalGeneratingChatClient: [FunctionResultContent(GetMenu, "mains.... deserts...")]
[FunctionApprovalResponseContent(GetMenu, approved=true)]
[FunctionApprovalResponseContent(GetSpecials, approved=false)] + + note right of ApprovalGeneratingChatClient: ApprovalGeneratingChatClient now throws away
approvals for executed functions, and creates
failed FunctionResultContent for denied function calls. + ApprovalGeneratingChatClient->>+ResponseChatClient: [FunctionResultContent(GetMenu, "mains.... deserts...")]
[FunctionResultContent(GetSpecials, "Function invocation denied")] +``` + +### Sequence Diagram: Pre FunctionInvokingChatClient ApprovalGeneratingChatClient - Multiple function calls with partial approval + +This is a discarded ChatClient Approval Stack option, but is included here for reference. + +It doesn't work for the scenario where we have multiple function calls for the same function in serial with different arguments. + +Flow: + +- AGCC turns AIFunctions into AIFunctionDefinitions (not invocable) and FICC ignores these. +- We get back a FunctionCall for one of these and it gets approved. +- We invoke the FICC again, this time with an AIFunction. +- We call the service with the FCC and FRC. +- We get back a new Function call for the same function again with different arguments. +- Since we were passed an AIFunction instead of an AIFunctionDefinition, we now incorrectly execute this FC without approval. + +```mermaid +--- +title: Multiple Functions with partial approval +--- + +sequenceDiagram + note right of Developer: Developer asks question with two functions. + Developer->>+ApprovalGeneratingChatClient: What is the special soup today? [GetMenu, GetSpecials] + note right of ApprovalGeneratingChatClient: AGCC marks functions as not-invocable + ApprovalGeneratingChatClient->>+FunctionInvokingChatClient: What is the special soup today?
[GetMenu(invocable=false)]
[GetSpecials(invocable=false)] + FunctionInvokingChatClient->>+ResponseChatClient: What is the special soup today?
[GetMenu(invocable=false)]
[GetSpecials(invocable=false)] + + ResponseChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)],
[FunctionCallContent(GetSpecials)] + note right of FunctionInvokingChatClient: FICC doesn't invoke functions since they are not invocable. + FunctionInvokingChatClient-->>-ApprovalGeneratingChatClient: [FunctionCallContent(GetMenu)],
[FunctionCallContent(GetSpecials)] + note right of ApprovalGeneratingChatClient: AGCC turns functions into approval requests + ApprovalGeneratingChatClient-->>-Developer: [FunctionApprovalRequestContent(GetMenu)]
[FunctionApprovalRequestContent(GetSpecials)] + + note right of Developer: Developer approves one function call and rejects the other. + Developer->>+ApprovalGeneratingChatClient: [FunctionApprovalResponseContent(GetMenu, approved=true)]
[FunctionApprovalResponseContent(GetSpecials, approved=false)] + note right of ApprovalGeneratingChatClient: AGCC turns turns approval requests
into FCC or failed function calls + ApprovalGeneratingChatClient->>+FunctionInvokingChatClient: [FunctionCallContent(GetMenu)]
[FunctionCallContent(GetSpecials)
[FunctionResultContent(GetSpecials, "Function invocation denied"))] + note right of FunctionInvokingChatClient: FICC invokes GetMenu since it's the only remaining one. + FunctionInvokingChatClient->>+ResponseChatClient: [FunctionCallContent(GetMenu)]
[FunctionResultContent(GetMenu, "mains.... deserts...")]
[FunctionCallContent(GetSpecials)
[FunctionResultContent(GetSpecials, "Function invocation denied"))] + + ResponseChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)]
[FunctionResultContent(GetMenu, "mains.... deserts...")]
[FunctionCallContent(GetSpecials)
[FunctionResultContent(GetSpecials, "Function invocation denied"))]
[TextContent("The specials soup is...")] + FunctionInvokingChatClient-->>-ApprovalGeneratingChatClient: [FunctionCallContent(GetMenu)]
[FunctionResultContent(GetMenu, "mains.... deserts...")]
[FunctionCallContent(GetSpecials)
[FunctionResultContent(GetSpecials, "Function invocation denied"))]
[TextContent("The specials soup is...")] + ApprovalGeneratingChatClient-->>-Developer: [FunctionCallContent(GetMenu)]
[FunctionResultContent(GetMenu, "mains.... deserts...")]
[FunctionCallContent(GetSpecials)
[FunctionResultContent(GetSpecials, "Function invocation denied"))]
[TextContent("The specials soup is...")] +``` diff --git a/dotnet/Directory.Packages.props b/dotnet/Directory.Packages.props index a90c7abc6f..d1e2807d22 100644 --- a/dotnet/Directory.Packages.props +++ b/dotnet/Directory.Packages.props @@ -35,7 +35,8 @@ - + + @@ -47,7 +48,7 @@ - + @@ -72,6 +73,8 @@ + + diff --git a/dotnet/agent-framework-dotnet.slnx b/dotnet/agent-framework-dotnet.slnx index 033eceebb0..1109f70441 100644 --- a/dotnet/agent-framework-dotnet.slnx +++ b/dotnet/agent-framework-dotnet.slnx @@ -128,6 +128,7 @@ + diff --git a/dotnet/samples/GettingStarted/GettingStarted.csproj b/dotnet/samples/GettingStarted/GettingStarted.csproj index ba192b9f86..596a77e505 100644 --- a/dotnet/samples/GettingStarted/GettingStarted.csproj +++ b/dotnet/samples/GettingStarted/GettingStarted.csproj @@ -37,6 +37,7 @@ + @@ -44,6 +45,7 @@ + diff --git a/dotnet/samples/GettingStarted/Steps/Step10_ChatClientAgent_UsingFunctionToolsWithApprovals.cs b/dotnet/samples/GettingStarted/Steps/Step10_ChatClientAgent_UsingFunctionToolsWithApprovals.cs index 7ae7a8d631..7bd6e11fe2 100644 --- a/dotnet/samples/GettingStarted/Steps/Step10_ChatClientAgent_UsingFunctionToolsWithApprovals.cs +++ b/dotnet/samples/GettingStarted/Steps/Step10_ChatClientAgent_UsingFunctionToolsWithApprovals.cs @@ -1,8 +1,13 @@ // Copyright (c) Microsoft. All rights reserved. using System.ComponentModel; +#if NETFRAMEWORK +using System.Net.Http; +#endif using Microsoft.Extensions.AI; using Microsoft.Extensions.AI.Agents; +using Microsoft.Extensions.AI.ModelContextProtocol; +using Microsoft.Extensions.DependencyInjection; namespace Steps; @@ -32,7 +37,12 @@ public async Task ApprovalsWithTools(ChatClientProviders provider) tools: [ new ApprovalRequiredAIFunction(AIFunctionFactory.Create(menuTools.GetMenu)), new ApprovalRequiredAIFunction(AIFunctionFactory.Create(menuTools.GetSpecials)), - AIFunctionFactory.Create(menuTools.GetItemPrice) + AIFunctionFactory.Create(menuTools.GetItemPrice), + new HostedMcpServerTool("Tiktoken Documentation", new Uri("https://gitmcp.io/openai/tiktoken")) + { + AllowedTools = ["search_tiktoken_documentation", "fetch_tiktoken_documentation"], + ApprovalMode = HostedMcpServerToolApprovalMode.AlwaysRequire, + } ]); // Create the server-side agent Id when applicable (depending on the provider). @@ -41,8 +51,26 @@ public async Task ApprovalsWithTools(ChatClientProviders provider) // Get the chat client to use for the agent. using var chatClient = base.GetChatClient(provider, agentOptions); + // Modify the chat client to include MCP and built-in approvals if not already present. + var chatBuilder = chatClient.AsBuilder(); + if (chatClient.GetService() is null) + { + chatBuilder.Use((IChatClient innerClient, IServiceProvider services) => + { + return new HostedMCPChatClient(innerClient, new HttpClient()); + }); + } + if (chatClient.GetService() is null) + { + chatBuilder.Use((IChatClient innerClient, IServiceProvider services) => + { + return new NewFunctionInvokingChatClient(innerClient, null, services); + }); + } + using var chatClientWithMCPAndApprovals = chatBuilder.Build(); + // Define the agent - var agent = new ChatClientAgent(chatClient, agentOptions); + var agent = new ChatClientAgent(chatClientWithMCPAndApprovals, agentOptions); // Create the chat history thread to capture the agent interaction. var thread = agent.GetNewThread(); @@ -50,6 +78,7 @@ public async Task ApprovalsWithTools(ChatClientProviders provider) // Respond to user input, invoking functions where appropriate. await RunAgentAsync("What is the special soup and its price?"); await RunAgentAsync("What is the special drink?"); + await RunAgentAsync("how does tiktoken work?"); async Task RunAgentAsync(string input) { @@ -63,7 +92,7 @@ async Task RunAgentAsync(string input) // Approve GetSpecials function calls, reject all others. List nextIterationMessages = userInputRequests?.Select((request) => request switch { - FunctionApprovalRequestContent functionApprovalRequest when functionApprovalRequest.FunctionCall.Name == "GetSpecials" => + FunctionApprovalRequestContent functionApprovalRequest when functionApprovalRequest.FunctionCall.Name == "GetSpecials" || functionApprovalRequest.FunctionCall.Name == "add" || functionApprovalRequest.FunctionCall.Name == "search_tiktoken_documentation" => new ChatMessage(ChatRole.User, [functionApprovalRequest.CreateResponse(approved: true)]), FunctionApprovalRequestContent functionApprovalRequest => @@ -110,6 +139,11 @@ public async Task ApprovalsWithToolsStreaming(ChatClientProviders provider) new ApprovalRequiredAIFunction(AIFunctionFactory.Create(menuTools.GetMenu)), new ApprovalRequiredAIFunction(AIFunctionFactory.Create(menuTools.GetSpecials)), AIFunctionFactory.Create(menuTools.GetItemPrice), + new HostedMcpServerTool("Tiktoken Documentation", new Uri("https://gitmcp.io/openai/tiktoken")) + { + AllowedTools = ["search_tiktoken_documentation", "fetch_tiktoken_documentation"], + ApprovalMode = HostedMcpServerToolApprovalMode.AlwaysRequire, + } ]); // Create the server-side agent Id when applicable (depending on the provider). @@ -118,8 +152,26 @@ public async Task ApprovalsWithToolsStreaming(ChatClientProviders provider) // Get the chat client to use for the agent. using var chatClient = base.GetChatClient(provider, agentOptions); + // Modify the chat client to include MCP and built-in approvals if not already present. + var chatBuilder = chatClient.AsBuilder(); + if (chatClient.GetService() is null) + { + chatBuilder.Use((IChatClient innerClient, IServiceProvider services) => + { + return new HostedMCPChatClient(innerClient, new HttpClient()); + }); + } + if (chatClient.GetService() is null) + { + chatBuilder.Use((IChatClient innerClient, IServiceProvider services) => + { + return new NewFunctionInvokingChatClient(innerClient, null, services); + }); + } + using var chatClientWithMCPAndApprovals = chatBuilder.Build(); + // Define the agent - var agent = new ChatClientAgent(chatClient, agentOptions); + var agent = new ChatClientAgent(chatClientWithMCPAndApprovals, agentOptions); // Create the chat history thread to capture the agent interaction. var thread = agent.GetNewThread(); @@ -127,6 +179,7 @@ public async Task ApprovalsWithToolsStreaming(ChatClientProviders provider) // Respond to user input, invoking functions where appropriate. await RunAgentAsync("What is the special soup and its price?"); await RunAgentAsync("What is the special drink?"); + await RunAgentAsync("how does tiktoken work?"); async Task RunAgentAsync(string input) { @@ -140,7 +193,7 @@ async Task RunAgentAsync(string input) // Approve GetSpecials function calls, reject all others. List nextIterationMessages = userInputRequests?.Select((request) => request switch { - FunctionApprovalRequestContent functionApprovalRequest when functionApprovalRequest.FunctionCall.Name == "GetSpecials" => + FunctionApprovalRequestContent functionApprovalRequest when functionApprovalRequest.FunctionCall.Name == "GetSpecials" || functionApprovalRequest.FunctionCall.Name == "add" || functionApprovalRequest.FunctionCall.Name == "search_tiktoken_documentation" => new ChatMessage(ChatRole.User, [functionApprovalRequest.CreateResponse(approved: true)]), FunctionApprovalRequestContent functionApprovalRequest => diff --git a/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerTool.cs b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerTool.cs new file mode 100644 index 0000000000..f5af2a85e6 --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerTool.cs @@ -0,0 +1,61 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System; +using System.Collections.Generic; +using Microsoft.Shared.Diagnostics; + +namespace Microsoft.Extensions.AI; + +/// +/// Represents a hosted MCP server tool that can be specified to an AI service. +/// +public class HostedMcpServerTool : AITool +{ + /// + /// Initializes a new instance of the class. + /// + /// The name of the remote MCP server. + /// The URL of the remote MCP server. + public HostedMcpServerTool(string serverName, Uri url) + { + ServerName = Throw.IfNullOrWhitespace(serverName); + Url = Throw.IfNull(url); + } + + /// + /// Gets the name of the remote MCP server that is used to identify it. + /// + public string ServerName { get; } + + /// + /// Gets the URL of the remote MCP server. + /// + public Uri Url { get; } + + /// + /// Gets or sets the description of the remote MCP server, used to provide more context to the AI service. + /// + public string? ServerDescription { get; set; } + + /// + /// Gets or sets the list of tools allowed to be used by the AI service. + /// + public IList? AllowedTools { get; set; } + + /// + /// Gets or sets the approval mode that indicates when the AI service should require user approval for tool calls to the remote MCP server. + /// + /// + /// You can set this property to to require approval for all tool calls, + /// or to to never require approval. + /// + public HostedMcpServerToolApprovalMode? ApprovalMode { get; set; } + + /// + /// Gets or sets the HTTP headers that the AI service should use when making tool calls to the remote MCP server. + /// + /// + /// This property is useful for specifying the authentication header or other headers required by the MCP server. + /// + public IDictionary? Headers { get; set; } +} diff --git a/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolAlwaysRequireApprovalMode.cs b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolAlwaysRequireApprovalMode.cs new file mode 100644 index 0000000000..9a4103c882 --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolAlwaysRequireApprovalMode.cs @@ -0,0 +1,14 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System.Diagnostics; + +namespace Microsoft.Extensions.AI; + +/// +/// Indicates that approval is always required for tool calls to a hosted MCP server. +/// +/// +/// Use to get an instance of . +/// +[DebuggerDisplay(nameof(AlwaysRequire))] +public sealed class HostedMcpServerToolAlwaysRequireApprovalMode : HostedMcpServerToolApprovalMode; diff --git a/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolApprovalMode.cs b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolApprovalMode.cs new file mode 100644 index 0000000000..b9cd2624ca --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolApprovalMode.cs @@ -0,0 +1,40 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System.Collections.Generic; + +namespace Microsoft.Extensions.AI; + +/// +/// Describes how approval is required for tool calls to a hosted MCP server. +/// +/// +/// The predefined values , and are provided to specify handling for all tools. +/// To specify approval behavior for individual tool names, use . +/// +#pragma warning disable CA1052 // Static holder types should be Static or NotInheritable +public class HostedMcpServerToolApprovalMode +#pragma warning restore CA1052 +{ + /// + /// Gets a predefined indicating that all tool calls to a hosted MCP server always require approval. + /// + public static HostedMcpServerToolAlwaysRequireApprovalMode AlwaysRequire { get; } = new(); + + /// + /// Gets a predefined indicating that all tool calls to a hosted MCP server never require approval. + /// + public static HostedMcpServerToolNeverRequireApprovalMode NeverRequire { get; } = new(); + + private protected HostedMcpServerToolApprovalMode() + { + } + + /// + /// Instantiates a that specifies approval behavior for individual tool names. + /// + /// The list of tools names that always require approval. + /// The list of tools names that never require approval. + /// An instance of for the specified tool names. + public static HostedMcpServerToolRequireSpecificApprovalMode RequireSpecific(IList? alwaysRequireApprovalToolNames, IList? neverRequireApprovalToolNames) + => new(alwaysRequireApprovalToolNames, neverRequireApprovalToolNames); +} diff --git a/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolNeverRequireApprovalMode .cs b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolNeverRequireApprovalMode .cs new file mode 100644 index 0000000000..4eed447217 --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolNeverRequireApprovalMode .cs @@ -0,0 +1,14 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System.Diagnostics; + +namespace Microsoft.Extensions.AI; + +/// +/// Indicates that approval is never required for tool calls to a hosted MCP server. +/// +/// +/// Use to get an instance of . +/// +[DebuggerDisplay(nameof(NeverRequire))] +public sealed class HostedMcpServerToolNeverRequireApprovalMode : HostedMcpServerToolApprovalMode; diff --git a/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolRequireSpecificApprovalMode .cs b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolRequireSpecificApprovalMode .cs new file mode 100644 index 0000000000..0979449887 --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.Agents.Abstractions/MEAI/HostedMcpServerToolRequireSpecificApprovalMode .cs @@ -0,0 +1,32 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System.Collections.Generic; + +namespace Microsoft.Extensions.AI; + +/// +/// Represents a mode where approval behavior is specified for individual tool names. +/// +public sealed class HostedMcpServerToolRequireSpecificApprovalMode : HostedMcpServerToolApprovalMode +{ + /// + /// Initializes a new instance of the class that specifies approval behavior for individual tool names. + /// + /// The list of tools names that always require approval. + /// The list of tools names that never require approval. + public HostedMcpServerToolRequireSpecificApprovalMode(IList? alwaysRequireApprovalToolNames, IList? neverRequireApprovalToolNames) + { + AlwaysRequireApprovalToolNames = alwaysRequireApprovalToolNames; + NeverRequireApprovalToolNames = neverRequireApprovalToolNames; + } + + /// + /// Gets or sets the list of tool names that always require approval. + /// + public IList? AlwaysRequireApprovalToolNames { get; set; } + + /// + /// Gets or sets the list of tool names that never require approval. + /// + public IList? NeverRequireApprovalToolNames { get; set; } +} diff --git a/dotnet/src/Microsoft.Extensions.AI.ModelContextProtocol/HostedMCPChatClient.cs b/dotnet/src/Microsoft.Extensions.AI.ModelContextProtocol/HostedMCPChatClient.cs new file mode 100644 index 0000000000..1fa78b8998 --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.ModelContextProtocol/HostedMCPChatClient.cs @@ -0,0 +1,191 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System; +using System.Collections.Concurrent; +using System.Collections.Generic; +using System.Net.Http; +using System.Runtime.CompilerServices; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Logging.Abstractions; +using Microsoft.Shared.Diagnostics; +using ModelContextProtocol.Client; + +namespace Microsoft.Extensions.AI.ModelContextProtocol; + +/// +/// Adds support for enabling MCP function invocation. +/// +public class HostedMCPChatClient : DelegatingChatClient +{ + private readonly ILoggerFactory? _loggerFactory; + + /// The logger to use for logging information about function invocation. + private readonly ILogger _logger; + + /// The HTTP client to use when connecting to the remote MCP server. + private readonly HttpClient _httpClient; + + /// A dictionary of cached mcp clients, keyed by the MCP server URL. + private ConcurrentDictionary? _mcpClients = null; + + /// + /// Initializes a new instance of the class. + /// + /// The underlying , or the next instance in a chain of clients. + /// The to use when connecting to the remote MCP server. + /// An to use for logging information about function invocation. + public HostedMCPChatClient(IChatClient innerClient, HttpClient httpClient, ILoggerFactory? loggerFactory = null) + : base(innerClient) + { + this._loggerFactory = loggerFactory; + this._logger = (ILogger?)loggerFactory?.CreateLogger() ?? NullLogger.Instance; + this._httpClient = Throw.IfNull(httpClient); + } + + /// + public override async Task GetResponseAsync( + IEnumerable messages, ChatOptions? options = null, CancellationToken cancellationToken = default) + { + if (options?.Tools is not { Count: > 0 }) + { + // If there are no tools, just call the inner client. + return await base.GetResponseAsync(messages, options, cancellationToken).ConfigureAwait(false); + } + + var downstreamTools = await this.BuildDownstreamAIToolsAsync(options.Tools, cancellationToken).ConfigureAwait(false); + options = options.Clone(); + options.Tools = downstreamTools; + + // Make the call to the inner client. + return await base.GetResponseAsync(messages, options, cancellationToken).ConfigureAwait(false); + } + + /// + public override async IAsyncEnumerable GetStreamingResponseAsync(IEnumerable messages, ChatOptions? options = null, [EnumeratorCancellation] CancellationToken cancellationToken = default) + { + if (options?.Tools is not { Count: > 0 }) + { + // If there are no tools, just call the inner client. + await foreach (var update in base.GetStreamingResponseAsync(messages, options, cancellationToken).ConfigureAwait(false)) + { + yield return update; + } + } + + var downstreamTools = await this.BuildDownstreamAIToolsAsync(options!.Tools, cancellationToken).ConfigureAwait(false); + options = options.Clone(); + options.Tools = downstreamTools; + + // Make the call to the inner client. + await foreach (var update in base.GetStreamingResponseAsync(messages, options, cancellationToken).ConfigureAwait(false)) + { + yield return update; + } + } + + private async Task?> BuildDownstreamAIToolsAsync(IList? inputTools, CancellationToken cancellationToken) + { + List? downstreamTools = null; + foreach (var tool in inputTools ?? []) + { + if (tool is HostedMcpServerTool mcpTool) + { + // List all MCP functions from the specified MCP server. + // This will need some caching in a real-world scenario to avoid repeated calls. + var mcpClient = await this.CreateMcpClientAsync(mcpTool.Url, mcpTool.ServerName).ConfigureAwait(false); + var mcpFunctions = await mcpClient.ListToolsAsync(cancellationToken: cancellationToken).ConfigureAwait(false); + + // Add the listed functions to our list of tools we'll pass to the inner client. + foreach (var mcpFunction in mcpFunctions) + { + if (mcpTool.AllowedTools is not null && !mcpTool.AllowedTools.Contains(mcpFunction.Name)) + { + this._logger.LogInformation("MCP function '{FunctionName}' is not allowed by the tool configuration.", mcpFunction.Name); + continue; + } + + downstreamTools ??= new List(); + switch (mcpTool.ApprovalMode) + { + case HostedMcpServerToolAlwaysRequireApprovalMode alwaysRequireApproval: + downstreamTools.Add(new ApprovalRequiredAIFunction(mcpFunction)); + break; + case HostedMcpServerToolNeverRequireApprovalMode neverRequireApproval: + downstreamTools.Add(mcpFunction); + break; + case HostedMcpServerToolRequireSpecificApprovalMode specificApprovalMode when specificApprovalMode.AlwaysRequireApprovalToolNames?.Contains(mcpFunction.Name) is true: + downstreamTools.Add(new ApprovalRequiredAIFunction(mcpFunction)); + break; + case HostedMcpServerToolRequireSpecificApprovalMode specificApprovalMode when specificApprovalMode.NeverRequireApprovalToolNames?.Contains(mcpFunction.Name) is true: + downstreamTools.Add(mcpFunction); + break; + default: + // Default to always require approval if no specific mode is set. + downstreamTools.Add(new ApprovalRequiredAIFunction(mcpFunction)); + break; + } + } + + // Skip adding the MCP tool itself, as we only want to add the functions it provides. + continue; + } + + // For other tools, we want to keep them in the list of tools. + downstreamTools ??= new List(); + downstreamTools.Add(tool); + } + + return downstreamTools; + } + + /// + protected override void Dispose(bool disposing) + { + if (disposing) + { + // Dispose of the HTTP client if it was created by this client. + this._httpClient?.Dispose(); + + if (this._mcpClients is not null) + { + // Dispose of all cached MCP clients. + foreach (var client in this._mcpClients.Values) + { +#pragma warning disable CA2012 // Use ValueTasks correctly + _ = client.DisposeAsync(); +#pragma warning restore CA2012 // Use ValueTasks correctly + } + + this._mcpClients.Clear(); + } + } + + base.Dispose(disposing); + } + + private async Task CreateMcpClientAsync(Uri mcpServiceUri, string serverName) + { + if (this._mcpClients is null) + { + this._mcpClients = new ConcurrentDictionary(StringComparer.OrdinalIgnoreCase); + } + + if (this._mcpClients.TryGetValue(mcpServiceUri.ToString(), out var cachedClient)) + { + // Return the cached client if it exists. + return cachedClient; + } + +#pragma warning disable CA2000 // Dispose objects before losing scope - This should be disposed by the mcp client. + var transport = new SseClientTransport(new() + { + Endpoint = mcpServiceUri, + Name = serverName, + }, this._httpClient, this._loggerFactory); +#pragma warning restore CA2000 // Dispose objects before losing scope + + return await McpClientFactory.CreateAsync(transport).ConfigureAwait(false); + } +} diff --git a/dotnet/src/Microsoft.Extensions.AI.ModelContextProtocol/Microsoft.Extensions.AI.ModelContextProtocol.csproj b/dotnet/src/Microsoft.Extensions.AI.ModelContextProtocol/Microsoft.Extensions.AI.ModelContextProtocol.csproj new file mode 100644 index 0000000000..1b66862dda --- /dev/null +++ b/dotnet/src/Microsoft.Extensions.AI.ModelContextProtocol/Microsoft.Extensions.AI.ModelContextProtocol.csproj @@ -0,0 +1,31 @@ + + + + $(ProjectsTargetFrameworks) + $(ProjectsDebugTargetFrameworks) + alpha + + + + true + true + true + + + + + + + Microsoft Extensions AI Model Context Protocol extensions + Contains ChatClient that helps call MCP tooling automatically as part of a ChatClient stack. + + + + + + + + + + +