Skip to content

perf(sheets): cap fan-out cell-matrix materialization to prevent OOM#1578

Open
zhengzhijiej-tech wants to merge 3 commits into
feat/lark-sheets-developfrom
feat/sheets-fanout-cell-cap
Open

perf(sheets): cap fan-out cell-matrix materialization to prevent OOM#1578
zhengzhijiej-tech wants to merge 3 commits into
feat/lark-sheets-developfrom
feat/sheets-fanout-cell-cap

Conversation

@zhengzhijiej-tech

@zhengzhijiej-tech zhengzhijiej-tech commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Problem

The fan-out / stamp shortcuts (+cells-set-style, +dropdown-set, +cells-batch-set-style, +dropdown-update) expand one A1 range into a rows×cols matrix of per-cell maps client-side, because the backing set_cell_range tool takes an explicit cells matrix. rangeDimensions() enforced no upper bound, so a short input string explodes into millions of heap maps with no guard.

Measured cost of materializing a single range in one shortcut call (go test -bench + a runtime.MemStats probe):

input cells alloc/op allocs/op peak heap
A1:J10000 100K 35 MB 210K +87 MB
A1:Z100000 2.6M 917 MB 5.3M +875 MB

That is enough to OOM a normal CI container / laptop before the request is even sent — and json.Marshal doubles it again on the way out.

Fix

Add a 50000-cell safety cap (checkStampMatrixBudget) at every fan-out materialization point, before fillCellsMatrix allocates. The value matches the documented --max-cells default, which was never actually wired up in code. Oversized ranges now fail fast with a clear validation error pointing the user to narrow / split the range.

Also preallocate the per-op slices (make([]interface{}, 0, len(ranges))) now that the range count is known up front.

After (same A1:Z100000 input): 917 MB / 5.3M allocs → 377 B / 5 allocs. Legitimate ranges (≤ 50000 cells) are unaffected.

Tests

  • New BenchmarkFillCellsMatrix_* + TestFanoutMatrixPeakMemory document the cost.
  • New TestStampMatrixBudgetCap covers the 49998 / 50000 / 50001 / 2.6M boundaries.
  • go vet, gofmt, and the full shortcuts/sheets + backward test suites pass.

Follow-up: sibling cap gaps

The single-range cap above left three sibling ingress paths that could materialize an unbounded matrix / op set the same way, before the request leaves:

  • +table-put / +workbook-create (--sheets / --values)buildSheetMatrix builds the whole rows×cols matrix in memory (measured: 2.6M cells → ~917 MB / 5.3M allocs, +875 MB resident heap — the same OOM profile as the single-range stamp) before slicing it into tablePutMaxCellsPerWrite-sized writes, so tablePutMaxCellsPerWrite bounds only the per-write batch, not the total input. Added tablePayload.checkCellBudget (1,000,000-cell guardrail), enforced in validate() and in buildValuesPayload (the --values path bypasses validate()).
  • batch fan-out (+cells-batch-set-style / +dropdown-update) — the per-range checkStampMatrixBudget can't stop many ranges from summing past the cap. Added an aggregate cell budget (checkBatchStampBudget, 50000) plus a shared maxBatchRanges (100) count cap in validateDropdownRanges — covering all fan-out commands and replacing the now-redundant +dropdown-delete count check.
  • +batch-update — capped --operations at maxBatchOperations (100) in translateBatchOperations.

New boundary tests: TestTablePutCellBudgetCap, TestBatchStampAggregateCap, TestBatchFanoutRangeCountCap, TestBatchOperationsCountCap, plus the TestTablePutMatrixPeakMemory / BenchmarkBuildSheetMatrix_* probes that produced the number above. go vet / gofmt clean; full shortcuts/sheets + backward suites green.

The +cells-set-style / +dropdown-set / +cells-batch-set-style /
+dropdown-update shortcuts expand a single A1 range into a rows×cols
matrix of per-cell maps client-side (the backing set_cell_range tool
takes an explicit cells matrix). rangeDimensions() had no upper bound,
so a tiny input like "A1:Z100000" balloons into ~2.6M heap maps (~900MB,
doubled again by json.Marshal) and can OOM the process before the
request is even sent.

Add a 50000-cell safety cap (checkStampMatrixBudget) gating every
fan-out materialization point, matching the documented but never-wired
--max-cells default. Oversized ranges now fail fast with a clear
validation error instead of allocating. Also preallocate the per-op
slices now that the range count is known up front.

Adds benchmarks + a boundary test as regression guards.
@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 473059c9-2e5f-4d25-9183-d5df86ea4478

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sheets-fanout-cell-cap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

…f the cell-matrix cap)

The single-range fan-out cap (maxStampMatrixCells) left three sibling
ingress paths uncapped, each able to materialize an unbounded matrix or
op set in memory before the request leaves:

- +table-put / +workbook-create --sheets/--values: buildSheetMatrix
  builds the whole rows×cols matrix before slicing it into per-write
  batches; tablePutMaxCellsPerWrite only bounds the batch size, not the
  total input. Add tablePayload.checkCellBudget (1M-cell guardrail),
  enforced in validate() and in buildValuesPayload (the --values path
  bypasses validate()).

- batch fan-out (+cells-batch-set-style / +dropdown-update): per-range
  checkStampMatrixBudget can't stop many ranges from summing past the
  cap. Add an aggregate cell budget (checkBatchStampBudget) and a shared
  maxBatchRanges (100) count cap in validateDropdownRanges — covering
  all fan-out commands and replacing the now-redundant +dropdown-delete
  count check.

- +batch-update: cap --operations at maxBatchOperations (100) in
  translateBatchOperations.

Adds boundary regression tests for each cap. go vet + gofmt clean; full
shortcuts/sheets + backward suites green.
Add BenchmarkBuildSheetMatrix_* and TestTablePutMatrixPeakMemory mirroring
the fan-out probes. Confirms the +table-put/+workbook-create ingress has the
same OOM profile as the single-range stamp: 2.6M cells → ~917 MB / 5.3M allocs
(+875 MB resident heap) materialized before the first write — now rejected up
front by checkCellBudget.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant