Skip to content

fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852

Open
zeroshade wants to merge 2 commits into
apache:mainfrom
zeroshade:fix/parquet-writebatchspaced-error
Open

fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852
zeroshade wants to merge 2 commits into
apache:mainfrom
zeroshade:fix/parquet-writebatchspaced-error

Conversation

@zeroshade

@zeroshade zeroshade commented Jun 11, 2026

Copy link
Copy Markdown
Member

Rationale for this change

Closes #826.

Across all 8 generated *ColumnChunkWriter types in parquet/file/column_writer_types.gen.go, WriteBatch and WriteBatchSpaced are an asymmetric pair:

  • WriteBatch returns (valueOffset int64, err error) and opens with a defer recover() that converts any panic into a returned error via utils.FormatRecoveredError.
  • WriteBatchSpaced returns nothing, has no defer recover(), and on the doBatches (non-byte-array) path silently discards the error returned by commitWriteAndCheckPageLimit.

So a write failure during a spaced write is either silently discarded (numeric/boolean/fixed paths) or escapes as a panic out of the calling goroutine (the byte-array branch's panic(err), plus other internal panics such as dictionary-fallback bookkeeping) — because, unlike WriteBatch, there is no recover() to turn it into an error.

This is the same class of "panic escapes / error lost on the public API" bug as #820 (fixed in #824), on a different code path. It is reachable through pqarrow(*pqarrow.FileWriter) drives WriteBatchSpaced for nullable columns — so for consumers writing nullable columns to a network-attached sink (GCS, S3 multipart, HDFS, etc.), a transient downstream failure on the spaced path crashes the worker process or silently drops the error instead of returning it.

The Boolean-only WriteBitmapBatchSpaced has the same void-signature / no-recover shape and is fixed here too.

What changes are included in this PR?

  1. New method WriteBatchSpacedWithError(values, defLevels, repLevels, validBits, validBitsOffset) (int64, error) generated for all 8 column-writer types from column_writer_types.gen.go.tmpl. It mirrors WriteBatch:

    • opens with defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError(...) } }(), and
    • checks the error from commitWriteAndCheckPageLimit on every branch (the doBatches branch previously ignored it), panic-ing so the deferred recover surfaces it as a returned error.
  2. New method WriteBitmapBatchSpacedWithError(...) (int64, error) on *BooleanColumnChunkWriter, applying the same treatment to the bitmap spaced-write path.

  3. WriteBatchSpaced and WriteBitmapBatchSpaced are left byte-for-byte unchanged — same signatures, same behavior. This keeps the change non-breaking and purely additive, exactly the approach fix(parquet): return error instead of panicking on first-write failure #824 took with NewParquetWriter / NewParquetWriterWithError. Existing callers are unaffected; callers that want error handling opt into the …WithError variants.

  4. pqarrow's dense-array encoder (writeDenseArrow) now calls WriteBatchSpacedWithError at all 12 spaced-write sites and propagates the error through its existing (err error) return, so nullable-column writes report sink failures as errors instead of crashing the goroutine or losing the error. (WriteBitmapBatchSpaced has no internal callers, so nothing to rewire there.)

The diff to the generated file and the template is additions-only (no existing generated method bodies change).

Are these changes tested?

Yes. Two new tests in parquet/file/column_writer_test.go use the existing mockpagewriter to fail WriteDataPage, with a tiny DataPageSize and dictionary disabled to force a flush mid-call:

  • TestWriteBatchSpacedWithErrorPropagatesWriteFailure
  • TestWriteBitmapBatchSpacedWithErrorPropagatesWriteFailure

Each asserts that the …WithError variant returns the wrapped failure (errors.Is(err, failureErr)) rather than panicking or discarding it, and that the legacy method does not panic out of the call (its original behavior is preserved).

The full parquet/file/... and parquet/pqarrow/... suites pass (PARQUET_TEST_DATA pointed at parquet-testing/data), and go vet is clean.

Are there any user-facing changes?

No breaking changes.

  • WriteBatchSpaced(...) and WriteBitmapBatchSpaced(...) — signatures and behavior unchanged.
  • WriteBatchSpacedWithError(...) (int64, error) (all 8 types) and WriteBitmapBatchSpacedWithError(...) (int64, error) (Boolean) — new exported methods. Adding exported methods is non-breaking in Go.
  • pqarrow public signatures are unchanged; the dense-array nullable write path now returns transient sink failures as an error (callers already checking err get strictly better behavior).

…ailures

WriteBatchSpaced returns nothing, so a write failure on the spaced path
was either silently discarded (the doBatches branch ignored the error
from commitWriteAndCheckPageLimit) or escaped as a panic out of the
calling goroutine -- unlike WriteBatch, it had no deferred recover. This
is the same class of bug as apache#820 (fixed in apache#824) on a different code
path, reachable through pqarrow when writing nullable columns to a sink
that may transiently fail (e.g. a cloud-storage upload writer).

Add WriteBatchSpacedWithError(...) (int64, error) to each generated
ColumnChunkWriter via the template: it mirrors WriteBatch, recovering
panics into a returned error and checking the commit error on every
branch. WriteBatchSpaced is left byte-for-byte unchanged (same signature
and behavior), so this is a non-breaking, additive change -- exactly as
apache#824 did with NewParquetWriter / NewParquetWriterWithError.

pqarrow's dense-array encoder now uses WriteBatchSpacedWithError so
nullable-column writes propagate sink failures as errors instead of
crashing the goroutine or losing the error.
WriteBitmapBatchSpaced (Boolean) shares the same void-signature, no-recover
shape as WriteBatchSpaced: it discarded the error from
commitWriteAndCheckPageLimit. Add WriteBitmapBatchSpacedWithError mirroring
WriteBatchSpacedWithError, leaving WriteBitmapBatchSpaced byte-for-byte
unchanged (additive, non-breaking). It has no internal callers, so no
rewiring is required; a regression test covers the new method.
@zeroshade zeroshade requested a review from lidavidm June 11, 2026 20:13

@lidavidm lidavidm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteBatchSpaced is internal, right? Why keep it at this point if it's not a public API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parquet/file: WriteBatchSpaced panics escape the API and silently discards commit-write errors

2 participants