Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a LibreOffice extraction mode, a LibreOffice UNO bridge and session runtime, OOXML drawing parsing, a LibreOffice-rich backend that merges UNO+OOXML shape/chart data with provenance/confidence, threads opt-in include_backend_metadata through APIs/CLI/MCP, updates schemas/models, CI smoke job, and extensive tests and docs changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Pipeline
participant BackendResolver
participant LibreOfficeBackend
participant LibreOfficeSession
participant OOXMLParser
participant Fallback
Client->>Pipeline: extract(file, mode="libreoffice", include_backend_metadata?)
Pipeline->>Pipeline: validate constraints (file type, no PDF/PNG, no auto-page-break)
Pipeline->>BackendResolver: resolve_rich_backend(mode="libreoffice")
BackendResolver->>LibreOfficeBackend: select LibreOfficeRichBackend
LibreOfficeBackend->>LibreOfficeSession: ensure runtime / from_env()
LibreOfficeSession->>LibreOfficeSession: start soffice, run bridge, fetch draw-page/chart JSON
LibreOfficeSession-->>LibreOfficeBackend: draw-page and chart payloads (UNO)
LibreOfficeBackend->>OOXMLParser: read_sheet_drawings(file) (OOXML)
OOXMLParser-->>LibreOfficeBackend: OOXML shapes/charts
LibreOfficeBackend->>LibreOfficeBackend: merge UNO + OOXML, assign provenance/confidence
LibreOfficeBackend-->>Pipeline: rich artifacts (shapes/charts)
Pipeline-->>Client: WorkbookData (include_backend_metadata as requested)
alt runtime unavailable or bridge fails
LibreOfficeSession-->>BackendResolver: LibreOfficeUnavailableError / pipeline failed
BackendResolver->>Fallback: fallback to light/openpyxl extraction
Fallback-->>Client: WorkbookData without rich artifacts
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
PR triage の結果、今回は次の指摘は対応対象から外しています。
理由は、いずれも現時点では correctness bug ではなく、既存仕様・実装・テスト契約を変えるか、大きめの設計整理を伴うためです。今回の follow-up では correctness / contract mismatch / CI gate を優先し、必要なものだけ |
|
post-push follow-up を反映しました。 検証は uv run pytest tests/engine/test_engine.py tests/test_conftest_libreoffice_runtime.py tests/core/test_libreoffice_backend.py -q で 44 passed、uv run task precommit-run で |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
未 resolve だった review thread 2 件に対応しました。 対応内容:
追加した回帰テスト:
検証:
push commit: |
|
Codacy の
検証:
補足: GitHub の review thread / inline review comment にはこの Codacy 指摘に対応するものが見当たらなかったため、resolve 対象の thread 自体はありませんでした。Codacy 側の再解析結果を待って、必要なら追加対応します。 |
#56
Summary
libreofficeextraction mode across the Python API, CLI, and MCP serverlibreofficemode rolloutWhat changed
mode="libreoffice"as a public extraction mode for.xlsx/.xlsm, with early rejection for.xlslibreofficecombinations such as PDF/PNG rendering and auto page-break exportlibreofficemodeTesting
uv run pytest tests/core/test_libreoffice_backend.py tests/core/test_pipeline_fallbacks.py tests/core/test_mode_output.py -k libreoffice -qRUN_LIBREOFFICE_SMOKE=1 uv run pytest tests/core/test_libreoffice_smoke.py -quv run pytest tests/core/test_mode_output.py tests/cli/test_cli.py tests/backends/test_auto_page_breaks.py -quv run pytest tests/test_conftest_libreoffice_runtime.py -quv run task precommit-runSummary by CodeRabbit
New Features
Improvements
Documentation
Tests / Chores