Skip to content

fix: drop withDefaultValue from StatisticsManager so checkpoint state round-trips#5150

Open
Ma77Ball wants to merge 5 commits into
apache:mainfrom
Ma77Ball:fix/checkPointState
Open

fix: drop withDefaultValue from StatisticsManager so checkpoint state round-trips#5150
Ma77Ball wants to merge 5 commits into
apache:mainfrom
Ma77Ball:fix/checkPointState

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

StatisticsManager declared its input/output stats maps as mutable.Map.empty.withDefaultValue((0L, 0L)). The resulting Map.WithDefault wrapper does not survive a Kryo round-trip (its inner map deserializes as null), so
chkpt.load(CP_STATE_KEY) on a default-state ControllerProcessor throws KryoException: NullPointerException, blocking Controller.loadFromCheckpoint from ever rehydrating a checkpointed controller. This PR removes the wrapper and inlines getOrElse(portId, (0L, 0L)) at the two write sites; behavior is unchanged.

Any related issues, documentation, or discussions?

closes: #4686

How was this PR tested?

Replaced the two existing should be serializable cases in CheckpointSpec with full save then load round-trips (controller + worker) that assert restored.actorId == original.actorId; the new tests reproduce the original NPE on main and pass after the fix. Verified locally with sbt 'WorkflowExecutionService / Test / testOnly org.apache.texera.amber.engine.faulttolerance.CheckpointSpec' (3/3 pass).

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7 in compliance with ASF

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

/request-review @aglinxinyuan

@github-actions github-actions Bot requested a review from aglinxinyuan May 22, 2026 08:40
@github-actions github-actions Bot added engine fix ci changes related to CI labels May 22, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.35%. Comparing base (bf2f92c) to head (c1870e3).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #5150   +/-   ##
=========================================
  Coverage     43.35%   43.35%           
  Complexity     2212     2212           
=========================================
  Files          1049     1049           
  Lines         40560    40558    -2     
  Branches       4322     4320    -2     
=========================================
  Hits          17583    17583           
  Misses        21886    21886           
+ Partials       1091     1089    -2     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø) Carriedforward from 07be263
agent-service 33.76% <ø> (ø) Carriedforward from 07be263
amber 43.81% <100.00%> (+<0.01%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from 07be263
config-service 0.00% <ø> (ø) Carriedforward from 07be263
file-service 32.18% <ø> (ø) Carriedforward from 07be263
frontend 34.61% <ø> (ø) Carriedforward from 07be263
python 90.50% <ø> (ø) Carriedforward from 07be263
workflow-compiling-service 56.81% <ø> (ø) Carriedforward from 07be263

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot removed the ci changes related to CI label May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ControllerProcessor save→load round-trip through CheckpointState throws Kryo NPE on StatisticsManager.inputStatistics

2 participants