[FEAT] Expose total_pages_processed in execution API response by pk-zipstack · Pull Request #1801 · Zipstack/unstract

pk-zipstack · 2026-02-23T08:51:46Z

What

Add UsageHelper.get_aggregated_pages_processed() to aggregate page count from PageUsage model per run_id
Add aggregated_total_pages_processed property on WorkflowExecution model to aggregate pages across all file executions
Enrich per-file metadata with total_pages_processed for API destinations in _process_final_output()
Enrich combined metadata with total_pages_processed for DB destinations in get_combined_metadata()
Expose aggregated_total_pages_processed in ExecutionSerializer for the execution list API

Why

The PageUsage model already tracks pages_processed per run_id (file execution ID), but this data was not surfaced in API responses
Exposing page counts alongside token usage and cost data enables clients to track document processing volume per execution

How

usage_v2/helper.py: Added get_aggregated_pages_processed(run_id) static method that queries PageUsage.objects.filter(run_id=run_id) and aggregates Sum('pages_processed'), returning int | None
workflow_v2/models/execution.py: Added aggregated_total_pages_processed property that collects file execution IDs via self.file_executions, converts to strings, and queries PageUsage with run_id__in
workflow_v2/file_execution_tasks.py: In _process_final_output(), after destination.get_metadata() for API destinations, injects total_pages_processed into execution_metadata
endpoint_v2/destination.py: In get_combined_metadata(), adds total_pages_processed alongside existing usage token data
execution/serializer/execution.py: Added aggregated_total_pages_processed as a SerializerMethodField

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No. All changes are additive — new helper method, new model property, new serializer field, and new keys in metadata dicts. No existing fields or behavior are modified. The total_pages_processed field gracefully returns None when no PageUsage data exists.

Database Migrations

None

Env Config

None

Relevant Docs

N/A

Related Issues or PRs

N/A

Dependencies Versions

No changes

Notes on Testing

Trigger a workflow execution via API deployment and verify total_pages_processed appears in per-file response metadata
Trigger a workflow execution via DB destination and verify total_pages_processed appears in combined metadata
Call the execution list API and verify aggregated_total_pages_processed appears in the response
Verify None is returned gracefully when no PageUsage records exist for an execution

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

…adata Surface page usage data from PageUsage model in API responses to support tracking total pages processed per file execution and per workflow execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

for more information, see https://pre-commit.ci

coderabbitai · 2026-02-23T08:52:14Z

Summary by CodeRabbit

New Features
- Added support for tracking and aggregating total pages processed across workflow executions, now available in execution metadata and API responses.

Walkthrough

This pull request introduces pages-processed aggregation tracking across workflow executions. A new helper method aggregates PageUsage records by run ID(s), a model property collects run IDs and delegates to this helper, a serializer field exposes the aggregated value, and API metadata is augmented with the total pages processed.

Changes

Cohort / File(s)	Summary
Pages Aggregation Helper `backend/usage_v2/helper.py`	Added static method `get_aggregated_pages_processed` that accepts a single `run_id` or list of `run_ids`, queries PageUsage records, and returns the sum of pages processed or None if no records exist.
Model Aggregation Property `backend/workflow_manager/workflow_v2/models/execution.py`	Added `aggregated_total_pages_processed` property to WorkflowExecution that collects run IDs from related file executions and delegates aggregation to UsageHelper.
Serialization and API `backend/workflow_manager/execution/serializer/execution.py`, `backend/workflow_manager/endpoint_v2/destination.py`	Added serializer field `aggregated_total_pages_processed` with getter method, and augmented workflow metadata with `total_pages_processed` computed from aggregated pages via UsageHelper.

Sequence Diagram

sequenceDiagram
    participant API as API Endpoint
    participant Serializer as ExecutionSerializer
    participant Model as WorkflowExecution
    participant Helper as UsageHelper
    participant DB as PageUsage DB

    API->>Serializer: get_serialized_execution()
    Serializer->>Model: access aggregated_total_pages_processed
    Model->>Model: collect file_execution_ids
    Model->>Helper: get_aggregated_pages_processed(run_ids)
    Helper->>DB: query PageUsage records
    DB-->>Helper: filtered records
    Helper->>Helper: aggregate Sum(pages_processed)
    Helper-->>Model: total_pages | None
    Model-->>Serializer: aggregated value
    Serializer-->>API: serialized response with field

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: exposing total_pages_processed in the execution API response, which is the primary objective across all modified files.
Description check	✅ Passed	The description comprehensively covers all required template sections with detailed information about changes, rationale, implementation approach, testing notes, and explicitly addresses breaking changes.
Docstring Coverage	✅ Passed	Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/expose-total-pages-in-extraction-llm-metadata

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

backend/workflow_manager/workflow_v2/models/execution.py (2)
262-280: PageUsage.run_id has no database index, but is now on multiple hot query paths.

The PageUsage model only indexes organization_id. Both filter(run_id=run_id) (in UsageHelper.get_aggregated_pages_processed) and filter(run_id__in=str_ids) (here) will full-scan the page_usage table as it grows. These queries are now triggered per-file in API deployments and per-row in execution list views.

A migration adding a db_index=True on run_id (or a Meta.indexes entry) is recommended:
# In account_usage/models.py – PageUsage.Meta
indexes = [
    models.Index(fields=["organization_id"]),
    models.Index(fields=["run_id"]),   # +
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 262 -
280, Add a database index on PageUsage.run_id to avoid full table scans for
queries used by aggregated_total_pages_processed and
UsageHelper.get_aggregated_pages_processed: update the PageUsage model Meta to
include an index for "run_id" (e.g., add models.Index(fields=["run_id"])
alongside the existing organization_id index) and create/apply a Django
migration so the new index is created in the database.
274-280: Same redundant .exists() + .aggregate() double-query as in UsageHelper.

Sum("pages_processed") on an empty queryset returns None for the key, so the .exists() check buys nothing. Collapsing to a single .aggregate() call saves one round-trip per property access (and this property is called per row in list-view serialization).
♻️ Proposed fix
-        str_ids = [str(fid) for fid in file_execution_ids]
-        queryset = PageUsage.objects.filter(run_id__in=str_ids)
-        if not queryset.exists():
-            return None
-
-        result = queryset.aggregate(total_pages=Sum("pages_processed"))
-        return result.get("total_pages")
+        str_ids = [str(fid) for fid in file_execution_ids]
+        result = PageUsage.objects.filter(run_id__in=str_ids).aggregate(
+            total_pages=Sum("pages_processed")
+        )
+        return result.get("total_pages")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 274 -
280, The current code does a redundant .exists() followed by .aggregate() which
causes an extra DB round-trip; replace the two-step pattern in the block that
builds str_ids and assigns queryset =
PageUsage.objects.filter(run_id__in=str_ids) with a single aggregate call: call
PageUsage.objects.filter(run_id__in=str_ids).aggregate(total_pages=Sum("pages_processed"))
and return result.get("total_pages") (this preserves None for no rows) — update
the logic around the variables file_execution_ids, queryset and the use of
Sum("pages_processed")/pages_processed to remove the .exists() check and avoid
the double query.
backend/workflow_manager/execution/serializer/execution.py (1)
36-38: aggregated_total_pages_processed adds up to 3 extra DB queries per execution in list views.

The model property fires: (1) a values_list on file_executions, (2) a PageUsage.exists(), (3) a PageUsage.aggregate(). Combined with the existing per-item queries for get_successful_files and get_failed_files, execution list endpoints are now executing ~7 queries per row. The class-level TODO already calls this out; this field makes addressing it more urgent.

Consider annotating the aggregate in the queryset that feeds the list view (e.g., via a subquery annotation or a bulk prefetch_related approach) rather than resolving it lazily per object.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/execution/serializer/execution.py` around lines 36 -
38, The current get_aggregated_total_pages_processed serializer method calls the
WorkflowExecution model property aggregated_total_pages_processed which triggers
multiple DB queries per row; instead annotate the queryset that feeds the list
view with the aggregated total (using a Subquery/OuterRef or a bulk aggregation
with Prefetch over file_executions/PageUsage) and change
get_aggregated_total_pages_processed to return that annotated value (e.g., read
obj.annotated_aggregated_total_pages_processed or similar) so no per-object
queries are run; update the view/queryset that constructs the list to include
the annotation name you choose and ensure the serializer reads that attribute
rather than accessing the model property.
backend/usage_v2/helper.py (1)
37-45: Redundant .exists() check creates an unnecessary extra DB query.

Sum on an empty queryset already returns None for the aggregated key, so result.get("total_pages") will return None when there are no records — the .exists() guard is superfluous. The sibling method get_aggregated_token_count uses a single .aggregate() call for this reason.

Additionally, two static-analysis hints are valid here:

BLE001: Replace bare except Exception with a narrower exception type, or at minimum annotate the intent.

TRY400: logger.error suppresses the traceback; logger.exception (or logger.error(..., exc_info=True)) is preferred.
♻️ Proposed fix
-        try:
-            queryset = PageUsage.objects.filter(run_id=run_id)
-            if not queryset.exists():
-                return None
-            result = queryset.aggregate(total_pages=Sum("pages_processed"))
-            return result.get("total_pages")
-        except Exception as e:
-            logger.error(f"Error aggregating pages processed for run_id {run_id}: {e}")
-            return None
+        try:
+            result = PageUsage.objects.filter(run_id=run_id).aggregate(
+                total_pages=Sum("pages_processed")
+            )
+            return result.get("total_pages")
+        except Exception as e:  # noqa: BLE001
+            logger.exception(f"Error aggregating pages processed for run_id {run_id}: {e}")
+            return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/helper.py` around lines 37 - 45, Remove the redundant
.exists() check and perform a single aggregate call on
PageUsage.objects.filter(run_id=run_id) returning result.get("total_pages")
(same pattern as get_aggregated_token_count); replace the bare except Exception
with a narrower exception (e.g., catch django.db.DatabaseError) and log the
failure with full traceback using logger.exception(...) (or logger.error(...,
exc_info=True)) to preserve stack information while keeping the behavior of
returning None on error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/usage_v2/helper.py`:
- Around line 37-45: Remove the redundant .exists() check and perform a single
aggregate call on PageUsage.objects.filter(run_id=run_id) returning
result.get("total_pages") (same pattern as get_aggregated_token_count); replace
the bare except Exception with a narrower exception (e.g., catch
django.db.DatabaseError) and log the failure with full traceback using
logger.exception(...) (or logger.error(..., exc_info=True)) to preserve stack
information while keeping the behavior of returning None on error.

In `@backend/workflow_manager/execution/serializer/execution.py`:
- Around line 36-38: The current get_aggregated_total_pages_processed serializer
method calls the WorkflowExecution model property
aggregated_total_pages_processed which triggers multiple DB queries per row;
instead annotate the queryset that feeds the list view with the aggregated total
(using a Subquery/OuterRef or a bulk aggregation with Prefetch over
file_executions/PageUsage) and change get_aggregated_total_pages_processed to
return that annotated value (e.g., read
obj.annotated_aggregated_total_pages_processed or similar) so no per-object
queries are run; update the view/queryset that constructs the list to include
the annotation name you choose and ensure the serializer reads that attribute
rather than accessing the model property.

In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 262-280: Add a database index on PageUsage.run_id to avoid full
table scans for queries used by aggregated_total_pages_processed and
UsageHelper.get_aggregated_pages_processed: update the PageUsage model Meta to
include an index for "run_id" (e.g., add models.Index(fields=["run_id"])
alongside the existing organization_id index) and create/apply a Django
migration so the new index is created in the database.
- Around line 274-280: The current code does a redundant .exists() followed by
.aggregate() which causes an extra DB round-trip; replace the two-step pattern
in the block that builds str_ids and assigns queryset =
PageUsage.objects.filter(run_id__in=str_ids) with a single aggregate call: call
PageUsage.objects.filter(run_id__in=str_ids).aggregate(total_pages=Sum("pages_processed"))
and return result.get("total_pages") (this preserves None for no rows) — update
the logic around the variables file_execution_ids, queryset and the use of
Sum("pages_processed")/pages_processed to remove the .exists() check and avoid
the double query.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d7babec and 25de508.

📒 Files selected for processing (5)

backend/usage_v2/helper.py
backend/workflow_manager/endpoint_v2/destination.py
backend/workflow_manager/execution/serializer/execution.py
backend/workflow_manager/workflow_v2/file_execution_tasks.py
backend/workflow_manager/workflow_v2/models/execution.py

backend/usage_v2/helper.py

backend/workflow_manager/workflow_v2/models/execution.py

…ption handling - Unify UsageHelper.get_aggregated_pages_processed() to accept either run_id or run_ids, eliminating duplicate PageUsage query logic - WorkflowExecution.aggregated_total_pages_processed now delegates to UsageHelper instead of duplicating the aggregation - Remove broad try/except so exceptions bubble up to middleware Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/usage_v2/helper.py (1)

49-52: Remove the redundant exists() check — it causes an extra DB round-trip.

Django's Sum on an empty queryset returns {"total_pages": None}, so result.get("total_pages") already returns None with no records present. The exists() guard adds a second database hit for every call without changing the outcome.

♻️ Proposed simplification

-        if not queryset.exists():
-            return None
-        result = queryset.aggregate(total_pages=Sum("pages_processed"))
-        return result.get("total_pages")
+        result = queryset.aggregate(total_pages=Sum("pages_processed"))
+        return result.get("total_pages")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/helper.py` around lines 49 - 52, The current code performs
an extra DB round-trip by calling queryset.exists() before aggregating; remove
the exists() check and directly call
queryset.aggregate(total_pages=Sum("pages_processed")) and return
result.get("total_pages") — keep the aggregate on the same queryset (using Sum
and "pages_processed") so empty querysets yield None without the extra exists()
query.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 261-277: The aggregated_total_pages_processed property issues
per-object DB queries (file_executions.values_list(...) and
UsageHelper.get_aggregated_pages_processed(...)) causing an N+1 when used in
ExecutionSerializer list endpoints; fix by either adding a DB index on
PageUsage.run_id to speed each aggregate query and adding a migration for that
index, or (preferred) batch the totals in the list view/serializer by computing
aggregated page totals for all execution IDs in one query and attaching them to
the queryset (override the list view or ExecutionSerializer to accept a
precomputed map keyed by execution id and avoid calling
aggregated_total_pages_processed per instance); reference the property
aggregated_total_pages_processed, the call to file_executions.values_list, and
UsageHelper.get_aggregated_pages_processed when implementing the change.

---

Nitpick comments:
In `@backend/usage_v2/helper.py`:
- Around line 49-52: The current code performs an extra DB round-trip by calling
queryset.exists() before aggregating; remove the exists() check and directly
call queryset.aggregate(total_pages=Sum("pages_processed")) and return
result.get("total_pages") — keep the aggregate on the same queryset (using Sum
and "pages_processed") so empty querysets yield None without the extra exists()
query.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 25de508 and adf2dc2.

📒 Files selected for processing (2)

backend/usage_v2/helper.py
backend/workflow_manager/workflow_v2/models/execution.py

backend/workflow_manager/workflow_v2/models/execution.py

…tadata Resolve conflict: accept deletion of file_execution_tasks.py (dead code removed in #1777 after workers v2 migration). The API metadata enrichment change from that file is no longer needed as workers v2 handles destination processing differently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

backend/workflow_manager/workflow_v2/models/execution.py

for more information, see https://pre-commit.ci

github-actions · 2026-03-02T06:45:21Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 63 passed, 0 failed (63 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-03-02T06:45:21Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

♻️ Duplicate comments (1)

backend/workflow_manager/workflow_v2/models/execution.py (1)

262-276: ⚠️ Potential issue | 🟠 Major

N+1 query risk still exists in aggregated pages property.

This property still performs per-instance DB work (file execution ID lookup + pages aggregation). Since it is exposed on the execution list serializer, this scales poorly with list size.

⚡ Minimal model-side hook to support batched precomputation

 `@property`
 def aggregated_total_pages_processed(self) -> int | None:
+    if hasattr(self, "_aggregated_total_pages_processed"):
+        return self._aggregated_total_pages_processed
+
     file_execution_ids = list(self.file_executions.values_list("id", flat=True))
     if not file_execution_ids:
         return None

     return UsageHelper.get_aggregated_pages_processed(
         run_ids=[str(fid) for fid in file_execution_ids]
     )

Then set _aggregated_total_pages_processed in the list queryset/serializer precompute path to avoid per-row queries.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 262 -
276, The aggregated_total_pages_processed property performs per-instance DB work
via file_executions.values_list("id", flat=True) and
UsageHelper.get_aggregated_pages_processed, causing N+1 queries when used in
list serializers; change the property to first check for a precomputed attribute
(e.g. _aggregated_total_pages_processed) and return it if present, and update
the list queryset/serializer precompute path to compute aggregated pages for all
executions in bulk (using the same UsageHelper.get_aggregated_pages_processed
logic but passing all file_execution ids at once) and set each Execution
instance's _aggregated_total_pages_processed before serialization so the
property no longer hits the DB per instance.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 262-276: The aggregated_total_pages_processed property performs
per-instance DB work via file_executions.values_list("id", flat=True) and
UsageHelper.get_aggregated_pages_processed, causing N+1 queries when used in
list serializers; change the property to first check for a precomputed attribute
(e.g. _aggregated_total_pages_processed) and return it if present, and update
the list queryset/serializer precompute path to compute aggregated pages for all
executions in bulk (using the same UsageHelper.get_aggregated_pages_processed
logic but passing all file_execution ids at once) and set each Execution
instance's _aggregated_total_pages_processed before serialization so the
property no longer hits the DB per instance.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between adf2dc2 and 6a730d6.

📒 Files selected for processing (1)

backend/workflow_manager/workflow_v2/models/execution.py

pk-zipstack and others added 2 commits February 23, 2026 14:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

25de508

for more information, see https://pre-commit.ci

pk-zipstack requested review from Deepak-Kesavan and chandrasekharan-zipstack February 23, 2026 08:55

pk-zipstack self-assigned this Feb 23, 2026

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

chandrasekharan-zipstack requested a review from muhammad-ali-e February 25, 2026 08:50

chandrasekharan-zipstack reviewed Feb 25, 2026

View reviewed changes

backend/usage_v2/helper.py Outdated Show resolved Hide resolved

backend/usage_v2/helper.py Outdated Show resolved Hide resolved

backend/workflow_manager/workflow_v2/models/execution.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

backend/workflow_manager/workflow_v2/models/execution.py Show resolved Hide resolved

pk-zipstack requested a review from chandrasekharan-zipstack March 2, 2026 06:01

chandrasekharan-zipstack approved these changes Mar 2, 2026

View reviewed changes

backend/workflow_manager/workflow_v2/models/execution.py Outdated Show resolved Hide resolved

pk-zipstack and others added 2 commits March 2, 2026 12:13

Move import to top of the file

ffec9f2

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a730d6

for more information, see https://pre-commit.ci

Deepak-Kesavan approved these changes Mar 2, 2026

View reviewed changes

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Expose total_pages_processed in execution API response#1801

[FEAT] Expose total_pages_processed in execution API response#1801
pk-zipstack wants to merge 6 commits intomainfrom
feat/expose-total-pages-in-extraction-llm-metadata

pk-zipstack commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

sonarqubecloud bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pk-zipstack commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026

Test Results

Uh oh!

sonarqubecloud bot commented Mar 2, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pk-zipstack commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading