Add IOcrClient OCR/document-extraction capability to Microsoft.Extensions.AI#7588
Open
luisquintanilla wants to merge 1 commit into
Open
Add IOcrClient OCR/document-extraction capability to Microsoft.Extensions.AI#7588luisquintanilla wants to merge 1 commit into
luisquintanilla wants to merge 1 commit into
Conversation
…ions.AI Introduces IOcrClient as a provider-neutral OCR/document-parsing capability in Microsoft.Extensions.AI.Abstractions, following the same abstraction + builder + middleware + DI shape as the existing capability family (IChatClient, ISpeechToTextClient, etc.). Abstractions (Microsoft.Extensions.AI.Abstractions): - IOcrClient, DelegatingOcrClient, OcrClientExtensions, OcrClientMetadata - OcrOptions, OcrResult, OcrPage, OcrTable, OcrTableCell, OcrBlock, OcrBoundingRegion, OcrUsage, OcrProgress Middleware + DI (Microsoft.Extensions.AI): - OcrClientBuilder, AsBuilder, AddOcrClient/AddKeyedOcrClient - LoggingOcrClient, OpenTelemetryOcrClient, ConfigureOptionsOcrClient and their builder extensions, mirroring the ISpeechToTextClient template All public surface is marked [Experimental] under the MEAI001 (AIOcr) diagnostic id. Includes unit tests for both libraries and updated API baselines.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new provider-neutral OCR / document-extraction capability to the Microsoft.Extensions.AI capability family, including abstractions, builder-based middleware pipeline, DI registration helpers, and OpenTelemetry/logging integrations.
Changes:
- Introduces
IOcrClient+ OCR result/options model types (pages, tables, blocks, bounding regions, usage/progress) inMicrosoft.Extensions.AI.Abstractions. - Adds
OcrClientBuilder+ DI registration extensions, plus middleware clients for logging, OpenTelemetry, and default-options configuration inMicrosoft.Extensions.AI. - Adds unit tests and updates API baseline JSONs and shared diagnostic IDs / OpenTelemetry constants to cover the new capability.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/OpenTelemetryOcrClientTests.cs | Verifies OCR OpenTelemetry spans/tags and ActivitySource service exposure. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/OcrClientDependencyInjectionPatterns.cs | Validates DI registration and middleware wrapping patterns for OCR clients. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/OcrClientBuilderTests.cs | Tests builder pipeline ordering, null-handling, and service-provider flow-through. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/LoggingOcrClientTests.cs | Tests logging middleware behavior across log levels and DI-resolved logger factory. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/ConfigureOptionsOcrClientTests.cs | Tests options-cloning/configuration middleware behavior. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Microsoft.Extensions.AI.Tests.csproj | Shares TestOcrClient into the non-abstractions test project. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/TestOcrClient.cs | Adds an OCR test double for abstraction-level tests. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrResultTests.cs | Tests OcrResult construction and markdown aggregation behavior. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrOptionsTests.cs | Tests cloning semantics for OcrOptions. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrClientMetadataTests.cs | Tests metadata property round-tripping and nullability behavior. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrClientExtensionsTests.cs | Tests DataContent overload and argument validation in extensions. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrBoundingRegionTests.cs | Tests bounding-region helpers and bounds calculation. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/DelegatingOcrClientTests.cs | Tests delegating client pass-through semantics and GetService behavior. |
| src/Shared/DiagnosticIds/DiagnosticIds.cs | Adds a new experimental diagnostic ID bucket for OCR (AIOcr). |
| src/Libraries/Microsoft.Extensions.AI/OpenTelemetryConsts.cs | Adds a non-standard GenAI usage attribute for pages processed. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/OpenTelemetryOcrClientBuilderExtensions.cs | Adds .UseOpenTelemetry(...) builder extension for OCR pipelines. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/OpenTelemetryOcrClient.cs | Implements OpenTelemetry instrumentation for OCR operations (spans + metrics). |
| src/Libraries/Microsoft.Extensions.AI/Ocr/OcrClientBuilderServiceCollectionExtensions.cs | Adds AddOcrClient / AddKeyedOcrClient DI registration helpers. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/OcrClientBuilderOcrClientExtensions.cs | Adds IOcrClient.AsBuilder() convenience extension. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/OcrClientBuilder.cs | Adds the OCR pipeline builder implementation (Use(...), Build(...)). |
| src/Libraries/Microsoft.Extensions.AI/Ocr/LoggingOcrClientBuilderExtensions.cs | Adds .UseLogging(...) builder extension for OCR pipelines. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/LoggingOcrClient.cs | Adds logging middleware for OCR operations with sensitive-data gating. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/ConfigureOptionsOcrClientBuilderExtensions.cs | Adds .ConfigureOptions(...) builder extension for defaulting/cloning options. |
| src/Libraries/Microsoft.Extensions.AI/Ocr/ConfigureOptionsOcrClient.cs | Implements options-configuration middleware for OCR operations. |
| src/Libraries/Microsoft.Extensions.AI/Microsoft.Extensions.AI.json | Updates API baseline to include OCR middleware/builder/DI additions. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Utilities/AIJsonUtilities.Defaults.cs | Registers OCR types for source-generated JSON serialization defaults. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrUsage.cs | Adds OCR usage model (pages processed + additional properties). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrTableCell.cs | Adds structured table-cell model (indices, spans, content, kind). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrTable.cs | Adds structured table model (cells or markdown + optional geometry). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrResult.cs | Adds OCR result model (pages + markdown aggregation + usage/raw/additional). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrProgress.cs | Adds progress model for long-running OCR operations. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrPage.cs | Adds per-page OCR model (markdown + tables/blocks/confidence). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrOptions.cs | Adds request options model (model id, include images, additional properties). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrClientMetadata.cs | Adds client metadata model (provider name/uri/default model). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrClientExtensions.cs | Adds GetService<T> + DataContent overload for OCR invocation. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrBoundingRegion.cs | Adds polygon-based geometry primitive + rectangle helper + bounds computation. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrBlock.cs | Adds layout-block model (text/kind/geometry/confidence). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/IOcrClient.cs | Introduces OCR client abstraction (unary call + IProgress + GetService). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/DelegatingOcrClient.cs | Adds delegating base class for OCR pipeline middleware. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.json | Updates API baseline to include new OCR abstraction surface. |
Comment on lines
+27
to
+31
| public OcrBoundingRegion(int pageNumber, IReadOnlyList<float> polygon) | ||
| { | ||
| PageNumber = pageNumber; | ||
| Polygon = Throw.IfNull(polygon); | ||
| } |
Comment on lines
+16
to
+32
| /// <summary>Registers a singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary> | ||
| /// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param> | ||
| /// <param name="innerClient">The inner <see cref="IOcrClient"/> that represents the underlying backend.</param> | ||
| /// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param> | ||
| /// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns> | ||
| public static OcrClientBuilder AddOcrClient( | ||
| this IServiceCollection serviceCollection, | ||
| IOcrClient innerClient, | ||
| ServiceLifetime lifetime = ServiceLifetime.Singleton) | ||
| => AddOcrClient(serviceCollection, _ => innerClient, lifetime); | ||
|
|
||
| /// <summary>Registers a singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary> | ||
| /// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param> | ||
| /// <param name="innerClientFactory">A callback that produces the inner <see cref="IOcrClient"/> that represents the underlying backend.</param> | ||
| /// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param> | ||
| /// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns> | ||
| public static OcrClientBuilder AddOcrClient( |
Comment on lines
+45
to
+63
| /// <summary>Registers a keyed singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary> | ||
| /// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param> | ||
| /// <param name="serviceKey">The key with which to associate the client.</param> | ||
| /// <param name="innerClient">The inner <see cref="IOcrClient"/> that represents the underlying backend.</param> | ||
| /// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param> | ||
| /// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns> | ||
| public static OcrClientBuilder AddKeyedOcrClient( | ||
| this IServiceCollection serviceCollection, | ||
| object? serviceKey, | ||
| IOcrClient innerClient, | ||
| ServiceLifetime lifetime = ServiceLifetime.Singleton) | ||
| => AddKeyedOcrClient(serviceCollection, serviceKey, _ => innerClient, lifetime); | ||
|
|
||
| /// <summary>Registers a keyed singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary> | ||
| /// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param> | ||
| /// <param name="serviceKey">The key with which to associate the client.</param> | ||
| /// <param name="innerClientFactory">A callback that produces the inner <see cref="IOcrClient"/> that represents the underlying backend.</param> | ||
| /// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param> | ||
| /// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
IOcrClient, a provider-neutral OCR / document-extraction capability toMicrosoft.Extensions.AI, following the same abstraction + builder + middleware + DI shape as the existing capability family (IChatClient,ISpeechToTextClient, etc.).Implements the API proposal in #7587.
What's included
Abstractions (
Microsoft.Extensions.AI.Abstractions)IOcrClient,DelegatingOcrClient,OcrClientExtensions,OcrClientMetadataOcrOptions,OcrResult,OcrPage,OcrTable,OcrTableCell,OcrBlock,OcrBoundingRegion,OcrUsage,OcrProgressMiddleware + DI (
Microsoft.Extensions.AI)OcrClientBuilder,AsBuilder,AddOcrClient/AddKeyedOcrClientLoggingOcrClient,OpenTelemetryOcrClient,ConfigureOptionsOcrClient+ builder extensions (mirrors theISpeechToTextClienttemplate)All public surface is
[Experimental]under theMEAI001diagnostic id. Unit tests cover both libraries (47 Ocr tests); API baselines updated.Notes
Microsoft.Extensions.Http.Resilience); the.Use(...)primitive can still wrap a custom decorator.Microsoft Reviewers: Open in CodeFlow