Skip to content

[FEAT] Expose total_pages_processed in execution API response#1801

Open
pk-zipstack wants to merge 6 commits intomainfrom
feat/expose-total-pages-in-extraction-llm-metadata
Open

[FEAT] Expose total_pages_processed in execution API response#1801
pk-zipstack wants to merge 6 commits intomainfrom
feat/expose-total-pages-in-extraction-llm-metadata

Conversation

@pk-zipstack
Copy link
Contributor

@pk-zipstack pk-zipstack commented Feb 23, 2026

What

  • Add UsageHelper.get_aggregated_pages_processed() to aggregate page count from PageUsage model per run_id
  • Add aggregated_total_pages_processed property on WorkflowExecution model to aggregate pages across all file executions
  • Enrich per-file metadata with total_pages_processed for API destinations in _process_final_output()
  • Enrich combined metadata with total_pages_processed for DB destinations in get_combined_metadata()
  • Expose aggregated_total_pages_processed in ExecutionSerializer for the execution list API

Why

  • The PageUsage model already tracks pages_processed per run_id (file execution ID), but this data was not surfaced in API responses
  • Exposing page counts alongside token usage and cost data enables clients to track document processing volume per execution

How

  • usage_v2/helper.py: Added get_aggregated_pages_processed(run_id) static method that queries PageUsage.objects.filter(run_id=run_id) and aggregates Sum('pages_processed'), returning int | None
  • workflow_v2/models/execution.py: Added aggregated_total_pages_processed property that collects file execution IDs via self.file_executions, converts to strings, and queries PageUsage with run_id__in
  • workflow_v2/file_execution_tasks.py: In _process_final_output(), after destination.get_metadata() for API destinations, injects total_pages_processed into execution_metadata
  • endpoint_v2/destination.py: In get_combined_metadata(), adds total_pages_processed alongside existing usage token data
  • execution/serializer/execution.py: Added aggregated_total_pages_processed as a SerializerMethodField

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • No. All changes are additive — new helper method, new model property, new serializer field, and new keys in metadata dicts. No existing fields or behavior are modified. The total_pages_processed field gracefully returns None when no PageUsage data exists.

Database Migrations

  • None

Env Config

  • None

Relevant Docs

  • N/A

Related Issues or PRs

  • N/A

Dependencies Versions

  • No changes

Notes on Testing

  1. Trigger a workflow execution via API deployment and verify total_pages_processed appears in per-file response metadata
  2. Trigger a workflow execution via DB destination and verify total_pages_processed appears in combined metadata
  3. Call the execution list API and verify aggregated_total_pages_processed appears in the response
  4. Verify None is returned gracefully when no PageUsage records exist for an execution

Screenshots

image

Checklist

I have read and understood the Contribution Guidelines.

pk-zipstack and others added 2 commits February 23, 2026 14:21
…adata

Surface page usage data from PageUsage model in API responses to support
tracking total pages processed per file execution and per workflow execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Summary by CodeRabbit

  • New Features
    • Added support for tracking and aggregating total pages processed across workflow executions, now available in execution metadata and API responses.

Walkthrough

This pull request introduces pages-processed aggregation tracking across workflow executions. A new helper method aggregates PageUsage records by run ID(s), a model property collects run IDs and delegates to this helper, a serializer field exposes the aggregated value, and API metadata is augmented with the total pages processed.

Changes

Cohort / File(s) Summary
Pages Aggregation Helper
backend/usage_v2/helper.py
Added static method get_aggregated_pages_processed that accepts a single run_id or list of run_ids, queries PageUsage records, and returns the sum of pages processed or None if no records exist.
Model Aggregation Property
backend/workflow_manager/workflow_v2/models/execution.py
Added aggregated_total_pages_processed property to WorkflowExecution that collects run IDs from related file executions and delegates aggregation to UsageHelper.
Serialization and API
backend/workflow_manager/execution/serializer/execution.py, backend/workflow_manager/endpoint_v2/destination.py
Added serializer field aggregated_total_pages_processed with getter method, and augmented workflow metadata with total_pages_processed computed from aggregated pages via UsageHelper.

Sequence Diagram

sequenceDiagram
    participant API as API Endpoint
    participant Serializer as ExecutionSerializer
    participant Model as WorkflowExecution
    participant Helper as UsageHelper
    participant DB as PageUsage DB

    API->>Serializer: get_serialized_execution()
    Serializer->>Model: access aggregated_total_pages_processed
    Model->>Model: collect file_execution_ids
    Model->>Helper: get_aggregated_pages_processed(run_ids)
    Helper->>DB: query PageUsage records
    DB-->>Helper: filtered records
    Helper->>Helper: aggregate Sum(pages_processed)
    Helper-->>Model: total_pages | None
    Model-->>Serializer: aggregated value
    Serializer-->>API: serialized response with field
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: exposing total_pages_processed in the execution API response, which is the primary objective across all modified files.
Description check ✅ Passed The description comprehensively covers all required template sections with detailed information about changes, rationale, implementation approach, testing notes, and explicitly addresses breaking changes.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/expose-total-pages-in-extraction-llm-metadata

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
backend/workflow_manager/workflow_v2/models/execution.py (2)

262-280: PageUsage.run_id has no database index, but is now on multiple hot query paths.

The PageUsage model only indexes organization_id. Both filter(run_id=run_id) (in UsageHelper.get_aggregated_pages_processed) and filter(run_id__in=str_ids) (here) will full-scan the page_usage table as it grows. These queries are now triggered per-file in API deployments and per-row in execution list views.

A migration adding a db_index=True on run_id (or a Meta.indexes entry) is recommended:

# In account_usage/models.py – PageUsage.Meta
indexes = [
    models.Index(fields=["organization_id"]),
    models.Index(fields=["run_id"]),   # +
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 262 -
280, Add a database index on PageUsage.run_id to avoid full table scans for
queries used by aggregated_total_pages_processed and
UsageHelper.get_aggregated_pages_processed: update the PageUsage model Meta to
include an index for "run_id" (e.g., add models.Index(fields=["run_id"])
alongside the existing organization_id index) and create/apply a Django
migration so the new index is created in the database.

274-280: Same redundant .exists() + .aggregate() double-query as in UsageHelper.

Sum("pages_processed") on an empty queryset returns None for the key, so the .exists() check buys nothing. Collapsing to a single .aggregate() call saves one round-trip per property access (and this property is called per row in list-view serialization).

♻️ Proposed fix
-        str_ids = [str(fid) for fid in file_execution_ids]
-        queryset = PageUsage.objects.filter(run_id__in=str_ids)
-        if not queryset.exists():
-            return None
-
-        result = queryset.aggregate(total_pages=Sum("pages_processed"))
-        return result.get("total_pages")
+        str_ids = [str(fid) for fid in file_execution_ids]
+        result = PageUsage.objects.filter(run_id__in=str_ids).aggregate(
+            total_pages=Sum("pages_processed")
+        )
+        return result.get("total_pages")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 274 -
280, The current code does a redundant .exists() followed by .aggregate() which
causes an extra DB round-trip; replace the two-step pattern in the block that
builds str_ids and assigns queryset =
PageUsage.objects.filter(run_id__in=str_ids) with a single aggregate call: call
PageUsage.objects.filter(run_id__in=str_ids).aggregate(total_pages=Sum("pages_processed"))
and return result.get("total_pages") (this preserves None for no rows) — update
the logic around the variables file_execution_ids, queryset and the use of
Sum("pages_processed")/pages_processed to remove the .exists() check and avoid
the double query.
backend/workflow_manager/execution/serializer/execution.py (1)

36-38: aggregated_total_pages_processed adds up to 3 extra DB queries per execution in list views.

The model property fires: (1) a values_list on file_executions, (2) a PageUsage.exists(), (3) a PageUsage.aggregate(). Combined with the existing per-item queries for get_successful_files and get_failed_files, execution list endpoints are now executing ~7 queries per row. The class-level TODO already calls this out; this field makes addressing it more urgent.

Consider annotating the aggregate in the queryset that feeds the list view (e.g., via a subquery annotation or a bulk prefetch_related approach) rather than resolving it lazily per object.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/execution/serializer/execution.py` around lines 36 -
38, The current get_aggregated_total_pages_processed serializer method calls the
WorkflowExecution model property aggregated_total_pages_processed which triggers
multiple DB queries per row; instead annotate the queryset that feeds the list
view with the aggregated total (using a Subquery/OuterRef or a bulk aggregation
with Prefetch over file_executions/PageUsage) and change
get_aggregated_total_pages_processed to return that annotated value (e.g., read
obj.annotated_aggregated_total_pages_processed or similar) so no per-object
queries are run; update the view/queryset that constructs the list to include
the annotation name you choose and ensure the serializer reads that attribute
rather than accessing the model property.
backend/usage_v2/helper.py (1)

37-45: Redundant .exists() check creates an unnecessary extra DB query.

Sum on an empty queryset already returns None for the aggregated key, so result.get("total_pages") will return None when there are no records — the .exists() guard is superfluous. The sibling method get_aggregated_token_count uses a single .aggregate() call for this reason.

Additionally, two static-analysis hints are valid here:

  • BLE001: Replace bare except Exception with a narrower exception type, or at minimum annotate the intent.
  • TRY400: logger.error suppresses the traceback; logger.exception (or logger.error(..., exc_info=True)) is preferred.
♻️ Proposed fix
-        try:
-            queryset = PageUsage.objects.filter(run_id=run_id)
-            if not queryset.exists():
-                return None
-            result = queryset.aggregate(total_pages=Sum("pages_processed"))
-            return result.get("total_pages")
-        except Exception as e:
-            logger.error(f"Error aggregating pages processed for run_id {run_id}: {e}")
-            return None
+        try:
+            result = PageUsage.objects.filter(run_id=run_id).aggregate(
+                total_pages=Sum("pages_processed")
+            )
+            return result.get("total_pages")
+        except Exception as e:  # noqa: BLE001
+            logger.exception(f"Error aggregating pages processed for run_id {run_id}: {e}")
+            return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/helper.py` around lines 37 - 45, Remove the redundant
.exists() check and perform a single aggregate call on
PageUsage.objects.filter(run_id=run_id) returning result.get("total_pages")
(same pattern as get_aggregated_token_count); replace the bare except Exception
with a narrower exception (e.g., catch django.db.DatabaseError) and log the
failure with full traceback using logger.exception(...) (or logger.error(...,
exc_info=True)) to preserve stack information while keeping the behavior of
returning None on error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/usage_v2/helper.py`:
- Around line 37-45: Remove the redundant .exists() check and perform a single
aggregate call on PageUsage.objects.filter(run_id=run_id) returning
result.get("total_pages") (same pattern as get_aggregated_token_count); replace
the bare except Exception with a narrower exception (e.g., catch
django.db.DatabaseError) and log the failure with full traceback using
logger.exception(...) (or logger.error(..., exc_info=True)) to preserve stack
information while keeping the behavior of returning None on error.

In `@backend/workflow_manager/execution/serializer/execution.py`:
- Around line 36-38: The current get_aggregated_total_pages_processed serializer
method calls the WorkflowExecution model property
aggregated_total_pages_processed which triggers multiple DB queries per row;
instead annotate the queryset that feeds the list view with the aggregated total
(using a Subquery/OuterRef or a bulk aggregation with Prefetch over
file_executions/PageUsage) and change get_aggregated_total_pages_processed to
return that annotated value (e.g., read
obj.annotated_aggregated_total_pages_processed or similar) so no per-object
queries are run; update the view/queryset that constructs the list to include
the annotation name you choose and ensure the serializer reads that attribute
rather than accessing the model property.

In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 262-280: Add a database index on PageUsage.run_id to avoid full
table scans for queries used by aggregated_total_pages_processed and
UsageHelper.get_aggregated_pages_processed: update the PageUsage model Meta to
include an index for "run_id" (e.g., add models.Index(fields=["run_id"])
alongside the existing organization_id index) and create/apply a Django
migration so the new index is created in the database.
- Around line 274-280: The current code does a redundant .exists() followed by
.aggregate() which causes an extra DB round-trip; replace the two-step pattern
in the block that builds str_ids and assigns queryset =
PageUsage.objects.filter(run_id__in=str_ids) with a single aggregate call: call
PageUsage.objects.filter(run_id__in=str_ids).aggregate(total_pages=Sum("pages_processed"))
and return result.get("total_pages") (this preserves None for no rows) — update
the logic around the variables file_execution_ids, queryset and the use of
Sum("pages_processed")/pages_processed to remove the .exists() check and avoid
the double query.
ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d7babec and 25de508.

📒 Files selected for processing (5)
  • backend/usage_v2/helper.py
  • backend/workflow_manager/endpoint_v2/destination.py
  • backend/workflow_manager/execution/serializer/execution.py
  • backend/workflow_manager/workflow_v2/file_execution_tasks.py
  • backend/workflow_manager/workflow_v2/models/execution.py

…ption handling

- Unify UsageHelper.get_aggregated_pages_processed() to accept either
  run_id or run_ids, eliminating duplicate PageUsage query logic
- WorkflowExecution.aggregated_total_pages_processed now delegates to
  UsageHelper instead of duplicating the aggregation
- Remove broad try/except so exceptions bubble up to middleware

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/usage_v2/helper.py (1)

49-52: Remove the redundant exists() check — it causes an extra DB round-trip.

Django's Sum on an empty queryset returns {"total_pages": None}, so result.get("total_pages") already returns None with no records present. The exists() guard adds a second database hit for every call without changing the outcome.

♻️ Proposed simplification
-        if not queryset.exists():
-            return None
-        result = queryset.aggregate(total_pages=Sum("pages_processed"))
-        return result.get("total_pages")
+        result = queryset.aggregate(total_pages=Sum("pages_processed"))
+        return result.get("total_pages")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/helper.py` around lines 49 - 52, The current code performs
an extra DB round-trip by calling queryset.exists() before aggregating; remove
the exists() check and directly call
queryset.aggregate(total_pages=Sum("pages_processed")) and return
result.get("total_pages") — keep the aggregate on the same queryset (using Sum
and "pages_processed") so empty querysets yield None without the extra exists()
query.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 261-277: The aggregated_total_pages_processed property issues
per-object DB queries (file_executions.values_list(...) and
UsageHelper.get_aggregated_pages_processed(...)) causing an N+1 when used in
ExecutionSerializer list endpoints; fix by either adding a DB index on
PageUsage.run_id to speed each aggregate query and adding a migration for that
index, or (preferred) batch the totals in the list view/serializer by computing
aggregated page totals for all execution IDs in one query and attaching them to
the queryset (override the list view or ExecutionSerializer to accept a
precomputed map keyed by execution id and avoid calling
aggregated_total_pages_processed per instance); reference the property
aggregated_total_pages_processed, the call to file_executions.values_list, and
UsageHelper.get_aggregated_pages_processed when implementing the change.

---

Nitpick comments:
In `@backend/usage_v2/helper.py`:
- Around line 49-52: The current code performs an extra DB round-trip by calling
queryset.exists() before aggregating; remove the exists() check and directly
call queryset.aggregate(total_pages=Sum("pages_processed")) and return
result.get("total_pages") — keep the aggregate on the same queryset (using Sum
and "pages_processed") so empty querysets yield None without the extra exists()
query.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 25de508 and adf2dc2.

📒 Files selected for processing (2)
  • backend/usage_v2/helper.py
  • backend/workflow_manager/workflow_v2/models/execution.py

…tadata

Resolve conflict: accept deletion of file_execution_tasks.py (dead code
removed in #1777 after workers v2 migration). The API metadata enrichment
change from that file is no longer needed as workers v2 handles destination
processing differently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Test Results

Summary
  • Runner Tests: 11 passed, 0 failed (11 total)
  • SDK1 Tests: 63 passed, 0 failed (63 total)

Runner Tests - Full Report
filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$
SDK1 Tests - Full Report
filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_non\_retryable\_http\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retryable\_http\_errors}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_post\_method\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_logging}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_retry\_on\_errors}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_wrapper\_methods\_retry}}$$ $$\textcolor{#23d18b}{\tt{4}}$$ $$\textcolor{#23d18b}{\tt{4}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_connection\_error\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_timeout\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_non\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_without\_response}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_non\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_other\_exception\_not\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_without\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_successful\_call\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_after\_transient\_failure}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_max\_retries\_exceeded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_with\_custom\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_no\_retry\_with\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_exception\_not\_in\_tuple\_not\_retried}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_default\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_environment\_variable\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_max\_retries}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_base\_delay}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_multiplier}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_jitter\_values}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_exceptions\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_predicate\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_both\_exceptions\_and\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_exceptions\_match\_but\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_platform\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_prompt\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_platform\_service\_decorator\_retries\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_prompt\_service\_decorator\_retries\_on\_timeout}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_warning\_logged\_on\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_info\_logged\_on\_success\_after\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_exception\_logged\_on\_giving\_up}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{63}}$$ $$\textcolor{#23d18b}{\tt{63}}$$

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 2, 2026

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/workflow_manager/workflow_v2/models/execution.py (1)

262-276: ⚠️ Potential issue | 🟠 Major

N+1 query risk still exists in aggregated pages property.

This property still performs per-instance DB work (file execution ID lookup + pages aggregation). Since it is exposed on the execution list serializer, this scales poorly with list size.

⚡ Minimal model-side hook to support batched precomputation
 `@property`
 def aggregated_total_pages_processed(self) -> int | None:
+    if hasattr(self, "_aggregated_total_pages_processed"):
+        return self._aggregated_total_pages_processed
+
     file_execution_ids = list(self.file_executions.values_list("id", flat=True))
     if not file_execution_ids:
         return None

     return UsageHelper.get_aggregated_pages_processed(
         run_ids=[str(fid) for fid in file_execution_ids]
     )

Then set _aggregated_total_pages_processed in the list queryset/serializer precompute path to avoid per-row queries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 262 -
276, The aggregated_total_pages_processed property performs per-instance DB work
via file_executions.values_list("id", flat=True) and
UsageHelper.get_aggregated_pages_processed, causing N+1 queries when used in
list serializers; change the property to first check for a precomputed attribute
(e.g. _aggregated_total_pages_processed) and return it if present, and update
the list queryset/serializer precompute path to compute aggregated pages for all
executions in bulk (using the same UsageHelper.get_aggregated_pages_processed
logic but passing all file_execution ids at once) and set each Execution
instance's _aggregated_total_pages_processed before serialization so the
property no longer hits the DB per instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 262-276: The aggregated_total_pages_processed property performs
per-instance DB work via file_executions.values_list("id", flat=True) and
UsageHelper.get_aggregated_pages_processed, causing N+1 queries when used in
list serializers; change the property to first check for a precomputed attribute
(e.g. _aggregated_total_pages_processed) and return it if present, and update
the list queryset/serializer precompute path to compute aggregated pages for all
executions in bulk (using the same UsageHelper.get_aggregated_pages_processed
logic but passing all file_execution ids at once) and set each Execution
instance's _aggregated_total_pages_processed before serialization so the
property no longer hits the DB per instance.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between adf2dc2 and 6a730d6.

📒 Files selected for processing (1)
  • backend/workflow_manager/workflow_v2/models/execution.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants