Extend IngestionPipeline to support processing documents without a file system reader#7488
Extend IngestionPipeline to support processing documents without a file system reader#7488Copilot wants to merge 6 commits into
Conversation
…ngestionDocument), update file-system methods to take reader param - Remove IngestionDocumentReader from constructor and field - Add new ProcessAsync(IngestionDocument) overload returning Task<IngestionDocument> - Add IngestionDocumentReader reader parameter to file-system ProcessAsync methods - Add ProcessDocument activity constant to DiagnosticsConstants - Replace var with explicit types in pipeline and tests - Update all DataIngestor.cs template/snapshot files - Add CanProcessDocumentWithoutReader test - Update README.md with pipeline usage examples Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/54f5e258-8414-40a0-b8b5-953677d1cce2 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/54f5e258-8414-40a0-b8b5-953677d1cce2 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot The Microsoft.Extensions.AI.Templates.Tests.AIChatWebExecutionTests tests are failing, fix it:
Working Directory: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost
Local Shipping Path: /mnt/vss/_work/1/s/artifacts/packages/Release/Shipping
NuGet Packages Path: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/packages
Command failed with non-zero exit code: 1
Standard Output:
Determining projects to restore...
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-mr8r-92fq-pj8p [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-q834-8qmm-v933 [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
Failed to restore /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj (in 8.11 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.Web/AIChatWeb_gh_aais_A_T_ID_F_net9.Web.csproj (in 10.66 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj (in 13.84 sec).
… template tests Update OpenTelemetry.Exporter.OpenTelemetryProtocol from 1.14.0 to 1.15.3 to fix GHSA-mr8r-92fq-pj8p and GHSA-q834-8qmm-v933 vulnerabilities. Also update companion OTel packages to their latest versions. Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/c6522938-3c86-4211-a016-ca79ad9b591e Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Fixed in 6258a67. Updated OpenTelemetry packages in
|
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot please sync this branch with latest changes from the target branch (data-ingestion-preview2), solve all the merge conflicts. Make sure the tests are passing before pushing the changes.
Done in 991c14a. Merged latest |
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot please address my feedback
| To process documents from the file system, create an `IngestionPipeline` and pass a reader to the `ProcessAsync` method: | ||
|
|
||
| ```csharp | ||
| using IngestionPipeline<string> pipeline = new(chunker, writer); |
There was a problem hiding this comment.
IngestionPipeline is no longer generic ;)
| using IngestionPipeline<string> pipeline = new(chunker, writer); | |
| using IngestionPipeline pipeline = new(chunker, writer); |
| The `IngestionPipeline` can also process documents that are already in memory, without requiring a reader: | ||
|
|
||
| ```csharp | ||
| using IngestionPipeline<string> pipeline = new(chunker, writer); |
There was a problem hiding this comment.
same
| using IngestionPipeline<string> pipeline = new(chunker, writer); | |
| using IngestionPipeline pipeline = new(chunker, writer); |
Fixed both |
IngestionPipeline.cs: remove_readerfield andreaderfrom constructor, addProcessAsync(IngestionDocument)returningTask<IngestionDocument>, addIngestionDocumentReader readeras first param to file-system ProcessAsync overloads and private helper, replace allvarwith explicit typesProcessDocumentactivity name toDiagnosticsConstants.csDataIngestor.csfiles (1 source template + 5 verified snapshots) to pass reader toProcessAsyncinstead of constructorIngestionPipelineTests.cs: update pipeline constructions & ProcessAsync calls, add test for document-basedProcessAsyncwithout reader, replacevarwith explicit typesREADME.mdwith pipeline creation and reader connection exampleMicrosoft.Extensions.DataIngestion.Teststo verify (124 passed, 11 skipped × 3 TFMs)Microsoft.Extensions.AI.Templates.IntegrationTestssnapshot tests to verify (5 passed)eng/packages/ProjectTemplates.propsto fix NU1902 vulnerability warnings