Skip to content

Extend IngestionPipeline to support processing documents without a file system reader#7488

Draft
Copilot wants to merge 6 commits into
data-ingestion-preview2from
copilot/extend-ingestion-pipeline
Draft

Extend IngestionPipeline to support processing documents without a file system reader#7488
Copilot wants to merge 6 commits into
data-ingestion-preview2from
copilot/extend-ingestion-pipeline

Conversation

Copilot AI commented Apr 26, 2026

Copy link
Copy Markdown
Contributor
  • Modify IngestionPipeline.cs: remove _reader field and reader from constructor, add ProcessAsync(IngestionDocument) returning Task<IngestionDocument>, add IngestionDocumentReader reader as first param to file-system ProcessAsync overloads and private helper, replace all var with explicit types
  • Add ProcessDocument activity name to DiagnosticsConstants.cs
  • Update all 6 DataIngestor.cs files (1 source template + 5 verified snapshots) to pass reader to ProcessAsync instead of constructor
  • Update IngestionPipelineTests.cs: update pipeline constructions & ProcessAsync calls, add test for document-based ProcessAsync without reader, replace var with explicit types
  • Update README.md with pipeline creation and reader connection example
  • Build and run Microsoft.Extensions.DataIngestion.Tests to verify (124 passed, 11 skipped × 3 TFMs)
  • Build and run Microsoft.Extensions.AI.Templates.IntegrationTests snapshot tests to verify (5 passed)
  • Update OpenTelemetry packages in eng/packages/ProjectTemplates.props to fix NU1902 vulnerability warnings

Copilot AI and others added 2 commits April 26, 2026 06:12
…ngestionDocument), update file-system methods to take reader param

- Remove IngestionDocumentReader from constructor and field
- Add new ProcessAsync(IngestionDocument) overload returning Task<IngestionDocument>
- Add IngestionDocumentReader reader parameter to file-system ProcessAsync methods
- Add ProcessDocument activity constant to DiagnosticsConstants
- Replace var with explicit types in pipeline and tests
- Update all DataIngestor.cs template/snapshot files
- Add CanProcessDocumentWithoutReader test
- Update README.md with pipeline usage examples

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/54f5e258-8414-40a0-b8b5-953677d1cce2

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copilot AI requested a review from adamsitnik April 26, 2026 06:37
@github-actions github-actions Bot added the area-ai-templates Microsoft.Extensions.AI.Templates label Apr 26, 2026
@adamsitnik adamsitnik added area-data-ingestion and removed area-ai-templates Microsoft.Extensions.AI.Templates labels Apr 26, 2026

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot The Microsoft.Extensions.AI.Templates.Tests.AIChatWebExecutionTests tests are failing, fix it:

Working Directory: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost
Local Shipping Path: /mnt/vss/_work/1/s/artifacts/packages/Release/Shipping
NuGet Packages Path: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/packages


Command failed with non-zero exit code: 1




Standard Output:
Determining projects to restore...
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-mr8r-92fq-pj8p [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-q834-8qmm-v933 [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
Failed to restore /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj (in 8.11 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.Web/AIChatWeb_gh_aais_A_T_ID_F_net9.Web.csproj (in 10.66 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj (in 13.84 sec).

… template tests

Update OpenTelemetry.Exporter.OpenTelemetryProtocol from 1.14.0 to 1.15.3
to fix GHSA-mr8r-92fq-pj8p and GHSA-q834-8qmm-v933 vulnerabilities.
Also update companion OTel packages to their latest versions.

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/c6522938-3c86-4211-a016-ca79ad9b591e

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>

Copilot AI commented Apr 26, 2026

Copy link
Copy Markdown
Contributor Author

@copilot The Microsoft.Extensions.AI.Templates.Tests.AIChatWebExecutionTests tests are failing, fix it:

Fixed in 6258a67. Updated OpenTelemetry packages in eng/packages/ProjectTemplates.props to resolve the NU1902 vulnerability warnings:

  • OpenTelemetry.Exporter.OpenTelemetryProtocol: 1.14.0 → 1.15.3 (fixes GHSA-mr8r-92fq-pj8p and GHSA-q834-8qmm-v933)
  • OpenTelemetry.Extensions.Hosting: 1.14.0 → 1.15.3
  • OpenTelemetry.Instrumentation.AspNetCore: 1.14.0 → 1.15.2
  • OpenTelemetry.Instrumentation.Http: 1.14.0 → 1.15.1
  • OpenTelemetry.Instrumentation.Runtime: 1.14.0 → 1.15.1

Copilot AI requested a review from adamsitnik April 26, 2026 17:04

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please sync this branch with latest changes from the target branch (data-ingestion-preview2), solve all the merge conflicts. Make sure the tests are passing before pushing the changes.

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please sync this branch with latest changes from the target branch (data-ingestion-preview2), solve all the merge conflicts. Make sure the tests are passing before pushing the changes.

Done in 991c14a. Merged latest data-ingestion-preview2 into this branch, resolved all 8 conflicts (adapted to non-generic IngestionPipeline, CommunityToolkit.VectorData.InMemory namespace, updated tests). All tests pass: 130 passed × 3 TFMs for DataIngestion tests, 5 passed for template snapshot tests.

Copilot AI requested a review from adamsitnik June 25, 2026 16:17

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address my feedback

To process documents from the file system, create an `IngestionPipeline` and pass a reader to the `ProcessAsync` method:

```csharp
using IngestionPipeline<string> pipeline = new(chunker, writer);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IngestionPipeline is no longer generic ;)

Suggested change
using IngestionPipeline<string> pipeline = new(chunker, writer);
using IngestionPipeline pipeline = new(chunker, writer);

The `IngestionPipeline` can also process documents that are already in memory, without requiring a reader:

```csharp
using IngestionPipeline<string> pipeline = new(chunker, writer);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Suggested change
using IngestionPipeline<string> pipeline = new(chunker, writer);
using IngestionPipeline pipeline = new(chunker, writer);

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please address my feedback

Fixed both IngestionPipeline<string>IngestionPipeline in the README examples.

Copilot AI requested a review from adamsitnik June 25, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants