Skip to content

Add IOcrClient OCR/document-extraction capability to Microsoft.Extensions.AI#7588

Open
luisquintanilla wants to merge 1 commit into
dotnet:mainfrom
luisquintanilla:feature/ocr-abstractions
Open

Add IOcrClient OCR/document-extraction capability to Microsoft.Extensions.AI#7588
luisquintanilla wants to merge 1 commit into
dotnet:mainfrom
luisquintanilla:feature/ocr-abstractions

Conversation

@luisquintanilla

@luisquintanilla luisquintanilla commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds IOcrClient, a provider-neutral OCR / document-extraction capability to Microsoft.Extensions.AI, following the same abstraction + builder + middleware + DI shape as the existing capability family (IChatClient, ISpeechToTextClient, etc.).

Implements the API proposal in #7587.

What's included

Abstractions (Microsoft.Extensions.AI.Abstractions)

  • IOcrClient, DelegatingOcrClient, OcrClientExtensions, OcrClientMetadata
  • OcrOptions, OcrResult, OcrPage, OcrTable, OcrTableCell, OcrBlock, OcrBoundingRegion, OcrUsage, OcrProgress

Middleware + DI (Microsoft.Extensions.AI)

  • OcrClientBuilder, AsBuilder, AddOcrClient / AddKeyedOcrClient
  • LoggingOcrClient, OpenTelemetryOcrClient, ConfigureOptionsOcrClient + builder extensions (mirrors the ISpeechToTextClient template)

All public surface is [Experimental] under the MEAI001 diagnostic id. Unit tests cover both libraries (47 Ocr tests); API baselines updated.

Notes

  • Resilience (retry) is intentionally not a built-in client — consistent with every other M.E.AI capability, it belongs in the HTTP pipeline (Microsoft.Extensions.Http.Resilience); the .Use(...) primitive can still wrap a custom decorator.
  • Distributed caching can follow the same builder shape; deferred to API review.
Microsoft Reviewers: Open in CodeFlow

…ions.AI

Introduces IOcrClient as a provider-neutral OCR/document-parsing capability in
Microsoft.Extensions.AI.Abstractions, following the same abstraction + builder +
middleware + DI shape as the existing capability family (IChatClient,
ISpeechToTextClient, etc.).

Abstractions (Microsoft.Extensions.AI.Abstractions):
- IOcrClient, DelegatingOcrClient, OcrClientExtensions, OcrClientMetadata
- OcrOptions, OcrResult, OcrPage, OcrTable, OcrTableCell, OcrBlock,
  OcrBoundingRegion, OcrUsage, OcrProgress

Middleware + DI (Microsoft.Extensions.AI):
- OcrClientBuilder, AsBuilder, AddOcrClient/AddKeyedOcrClient
- LoggingOcrClient, OpenTelemetryOcrClient, ConfigureOptionsOcrClient and their
  builder extensions, mirroring the ISpeechToTextClient template

All public surface is marked [Experimental] under the MEAI001 (AIOcr) diagnostic id.
Includes unit tests for both libraries and updated API baselines.
Copilot AI review requested due to automatic review settings June 25, 2026 23:53
@luisquintanilla luisquintanilla requested review from a team as code owners June 25, 2026 23:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new provider-neutral OCR / document-extraction capability to the Microsoft.Extensions.AI capability family, including abstractions, builder-based middleware pipeline, DI registration helpers, and OpenTelemetry/logging integrations.

Changes:

  • Introduces IOcrClient + OCR result/options model types (pages, tables, blocks, bounding regions, usage/progress) in Microsoft.Extensions.AI.Abstractions.
  • Adds OcrClientBuilder + DI registration extensions, plus middleware clients for logging, OpenTelemetry, and default-options configuration in Microsoft.Extensions.AI.
  • Adds unit tests and updates API baseline JSONs and shared diagnostic IDs / OpenTelemetry constants to cover the new capability.

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/OpenTelemetryOcrClientTests.cs Verifies OCR OpenTelemetry spans/tags and ActivitySource service exposure.
test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/OcrClientDependencyInjectionPatterns.cs Validates DI registration and middleware wrapping patterns for OCR clients.
test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/OcrClientBuilderTests.cs Tests builder pipeline ordering, null-handling, and service-provider flow-through.
test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/LoggingOcrClientTests.cs Tests logging middleware behavior across log levels and DI-resolved logger factory.
test/Libraries/Microsoft.Extensions.AI.Tests/Ocr/ConfigureOptionsOcrClientTests.cs Tests options-cloning/configuration middleware behavior.
test/Libraries/Microsoft.Extensions.AI.Tests/Microsoft.Extensions.AI.Tests.csproj Shares TestOcrClient into the non-abstractions test project.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/TestOcrClient.cs Adds an OCR test double for abstraction-level tests.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrResultTests.cs Tests OcrResult construction and markdown aggregation behavior.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrOptionsTests.cs Tests cloning semantics for OcrOptions.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrClientMetadataTests.cs Tests metadata property round-tripping and nullability behavior.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrClientExtensionsTests.cs Tests DataContent overload and argument validation in extensions.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/OcrBoundingRegionTests.cs Tests bounding-region helpers and bounds calculation.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Ocr/DelegatingOcrClientTests.cs Tests delegating client pass-through semantics and GetService behavior.
src/Shared/DiagnosticIds/DiagnosticIds.cs Adds a new experimental diagnostic ID bucket for OCR (AIOcr).
src/Libraries/Microsoft.Extensions.AI/OpenTelemetryConsts.cs Adds a non-standard GenAI usage attribute for pages processed.
src/Libraries/Microsoft.Extensions.AI/Ocr/OpenTelemetryOcrClientBuilderExtensions.cs Adds .UseOpenTelemetry(...) builder extension for OCR pipelines.
src/Libraries/Microsoft.Extensions.AI/Ocr/OpenTelemetryOcrClient.cs Implements OpenTelemetry instrumentation for OCR operations (spans + metrics).
src/Libraries/Microsoft.Extensions.AI/Ocr/OcrClientBuilderServiceCollectionExtensions.cs Adds AddOcrClient / AddKeyedOcrClient DI registration helpers.
src/Libraries/Microsoft.Extensions.AI/Ocr/OcrClientBuilderOcrClientExtensions.cs Adds IOcrClient.AsBuilder() convenience extension.
src/Libraries/Microsoft.Extensions.AI/Ocr/OcrClientBuilder.cs Adds the OCR pipeline builder implementation (Use(...), Build(...)).
src/Libraries/Microsoft.Extensions.AI/Ocr/LoggingOcrClientBuilderExtensions.cs Adds .UseLogging(...) builder extension for OCR pipelines.
src/Libraries/Microsoft.Extensions.AI/Ocr/LoggingOcrClient.cs Adds logging middleware for OCR operations with sensitive-data gating.
src/Libraries/Microsoft.Extensions.AI/Ocr/ConfigureOptionsOcrClientBuilderExtensions.cs Adds .ConfigureOptions(...) builder extension for defaulting/cloning options.
src/Libraries/Microsoft.Extensions.AI/Ocr/ConfigureOptionsOcrClient.cs Implements options-configuration middleware for OCR operations.
src/Libraries/Microsoft.Extensions.AI/Microsoft.Extensions.AI.json Updates API baseline to include OCR middleware/builder/DI additions.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Utilities/AIJsonUtilities.Defaults.cs Registers OCR types for source-generated JSON serialization defaults.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrUsage.cs Adds OCR usage model (pages processed + additional properties).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrTableCell.cs Adds structured table-cell model (indices, spans, content, kind).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrTable.cs Adds structured table model (cells or markdown + optional geometry).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrResult.cs Adds OCR result model (pages + markdown aggregation + usage/raw/additional).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrProgress.cs Adds progress model for long-running OCR operations.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrPage.cs Adds per-page OCR model (markdown + tables/blocks/confidence).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrOptions.cs Adds request options model (model id, include images, additional properties).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrClientMetadata.cs Adds client metadata model (provider name/uri/default model).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrClientExtensions.cs Adds GetService<T> + DataContent overload for OCR invocation.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrBoundingRegion.cs Adds polygon-based geometry primitive + rectangle helper + bounds computation.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/OcrBlock.cs Adds layout-block model (text/kind/geometry/confidence).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/IOcrClient.cs Introduces OCR client abstraction (unary call + IProgress + GetService).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Ocr/DelegatingOcrClient.cs Adds delegating base class for OCR pipeline middleware.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.json Updates API baseline to include new OCR abstraction surface.

Comment on lines +27 to +31
public OcrBoundingRegion(int pageNumber, IReadOnlyList<float> polygon)
{
PageNumber = pageNumber;
Polygon = Throw.IfNull(polygon);
}
Comment on lines +16 to +32
/// <summary>Registers a singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary>
/// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param>
/// <param name="innerClient">The inner <see cref="IOcrClient"/> that represents the underlying backend.</param>
/// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param>
/// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns>
public static OcrClientBuilder AddOcrClient(
this IServiceCollection serviceCollection,
IOcrClient innerClient,
ServiceLifetime lifetime = ServiceLifetime.Singleton)
=> AddOcrClient(serviceCollection, _ => innerClient, lifetime);

/// <summary>Registers a singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary>
/// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param>
/// <param name="innerClientFactory">A callback that produces the inner <see cref="IOcrClient"/> that represents the underlying backend.</param>
/// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param>
/// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns>
public static OcrClientBuilder AddOcrClient(
Comment on lines +45 to +63
/// <summary>Registers a keyed singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary>
/// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param>
/// <param name="serviceKey">The key with which to associate the client.</param>
/// <param name="innerClient">The inner <see cref="IOcrClient"/> that represents the underlying backend.</param>
/// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param>
/// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns>
public static OcrClientBuilder AddKeyedOcrClient(
this IServiceCollection serviceCollection,
object? serviceKey,
IOcrClient innerClient,
ServiceLifetime lifetime = ServiceLifetime.Singleton)
=> AddKeyedOcrClient(serviceCollection, serviceKey, _ => innerClient, lifetime);

/// <summary>Registers a keyed singleton <see cref="IOcrClient"/> in the <see cref="IServiceCollection"/>.</summary>
/// <param name="serviceCollection">The <see cref="IServiceCollection"/> to which the client should be added.</param>
/// <param name="serviceKey">The key with which to associate the client.</param>
/// <param name="innerClientFactory">A callback that produces the inner <see cref="IOcrClient"/> that represents the underlying backend.</param>
/// <param name="lifetime">The service lifetime for the client. Defaults to <see cref="ServiceLifetime.Singleton"/>.</param>
/// <returns>An <see cref="OcrClientBuilder"/> that can be used to build a pipeline around the inner client.</returns>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants