fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852
Open
zeroshade wants to merge 2 commits into
Open
fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852zeroshade wants to merge 2 commits into
zeroshade wants to merge 2 commits into
Conversation
…ailures WriteBatchSpaced returns nothing, so a write failure on the spaced path was either silently discarded (the doBatches branch ignored the error from commitWriteAndCheckPageLimit) or escaped as a panic out of the calling goroutine -- unlike WriteBatch, it had no deferred recover. This is the same class of bug as apache#820 (fixed in apache#824) on a different code path, reachable through pqarrow when writing nullable columns to a sink that may transiently fail (e.g. a cloud-storage upload writer). Add WriteBatchSpacedWithError(...) (int64, error) to each generated ColumnChunkWriter via the template: it mirrors WriteBatch, recovering panics into a returned error and checking the commit error on every branch. WriteBatchSpaced is left byte-for-byte unchanged (same signature and behavior), so this is a non-breaking, additive change -- exactly as apache#824 did with NewParquetWriter / NewParquetWriterWithError. pqarrow's dense-array encoder now uses WriteBatchSpacedWithError so nullable-column writes propagate sink failures as errors instead of crashing the goroutine or losing the error.
WriteBitmapBatchSpaced (Boolean) shares the same void-signature, no-recover shape as WriteBatchSpaced: it discarded the error from commitWriteAndCheckPageLimit. Add WriteBitmapBatchSpacedWithError mirroring WriteBatchSpacedWithError, leaving WriteBitmapBatchSpaced byte-for-byte unchanged (additive, non-breaking). It has no internal callers, so no rewiring is required; a regression test covers the new method.
lidavidm
reviewed
Jun 12, 2026
lidavidm
left a comment
Member
There was a problem hiding this comment.
WriteBatchSpaced is internal, right? Why keep it at this point if it's not a public API?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Closes #826.
Across all 8 generated
*ColumnChunkWritertypes inparquet/file/column_writer_types.gen.go,WriteBatchandWriteBatchSpacedare an asymmetric pair:WriteBatchreturns(valueOffset int64, err error)and opens with adefer recover()that converts any panic into a returnederrorviautils.FormatRecoveredError.WriteBatchSpacedreturns nothing, has nodefer recover(), and on thedoBatches(non-byte-array) path silently discards theerrorreturned bycommitWriteAndCheckPageLimit.So a write failure during a spaced write is either silently discarded (numeric/boolean/fixed paths) or escapes as a panic out of the calling goroutine (the byte-array branch's
panic(err), plus other internal panics such as dictionary-fallback bookkeeping) — because, unlikeWriteBatch, there is norecover()to turn it into anerror.This is the same class of "panic escapes / error lost on the public API" bug as #820 (fixed in #824), on a different code path. It is reachable through
pqarrow—(*pqarrow.FileWriter)drivesWriteBatchSpacedfor nullable columns — so for consumers writing nullable columns to a network-attached sink (GCS, S3 multipart, HDFS, etc.), a transient downstream failure on the spaced path crashes the worker process or silently drops the error instead of returning it.The Boolean-only
WriteBitmapBatchSpacedhas the same void-signature / no-recover shape and is fixed here too.What changes are included in this PR?
New method
WriteBatchSpacedWithError(values, defLevels, repLevels, validBits, validBitsOffset) (int64, error)generated for all 8 column-writer types fromcolumn_writer_types.gen.go.tmpl. It mirrorsWriteBatch:defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError(...) } }(), anderrorfromcommitWriteAndCheckPageLimiton every branch (thedoBatchesbranch previously ignored it),panic-ing so the deferred recover surfaces it as a returnederror.New method
WriteBitmapBatchSpacedWithError(...) (int64, error)on*BooleanColumnChunkWriter, applying the same treatment to the bitmap spaced-write path.WriteBatchSpacedandWriteBitmapBatchSpacedare left byte-for-byte unchanged — same signatures, same behavior. This keeps the change non-breaking and purely additive, exactly the approach fix(parquet): return error instead of panicking on first-write failure #824 took withNewParquetWriter/NewParquetWriterWithError. Existing callers are unaffected; callers that want error handling opt into the…WithErrorvariants.pqarrow's dense-array encoder (writeDenseArrow) now callsWriteBatchSpacedWithErrorat all 12 spaced-write sites and propagates the error through its existing(err error)return, so nullable-column writes report sink failures as errors instead of crashing the goroutine or losing the error. (WriteBitmapBatchSpacedhas no internal callers, so nothing to rewire there.)The diff to the generated file and the template is additions-only (no existing generated method bodies change).
Are these changes tested?
Yes. Two new tests in
parquet/file/column_writer_test.gouse the existingmockpagewriterto failWriteDataPage, with a tinyDataPageSizeand dictionary disabled to force a flush mid-call:TestWriteBatchSpacedWithErrorPropagatesWriteFailureTestWriteBitmapBatchSpacedWithErrorPropagatesWriteFailureEach asserts that the
…WithErrorvariant returns the wrapped failure (errors.Is(err, failureErr)) rather than panicking or discarding it, and that the legacy method does not panic out of the call (its original behavior is preserved).The full
parquet/file/...andparquet/pqarrow/...suites pass (PARQUET_TEST_DATApointed atparquet-testing/data), andgo vetis clean.Are there any user-facing changes?
No breaking changes.
WriteBatchSpaced(...)andWriteBitmapBatchSpaced(...)— signatures and behavior unchanged.WriteBatchSpacedWithError(...) (int64, error)(all 8 types) andWriteBitmapBatchSpacedWithError(...) (int64, error)(Boolean) — new exported methods. Adding exported methods is non-breaking in Go.pqarrowpublic signatures are unchanged; the dense-array nullable write path now returns transient sink failures as anerror(callers already checkingerrget strictly better behavior).