Skip to content

Importing vortex after pyarrow.dataset impacts the pyarrow runtime, causing SIGSEGV #7760

@alexander-beedie

Description

@alexander-beedie

What happened?

Importing vortex after importing pyarrow.dataset corrupts pyarrow's runtime state.
Calling in to pyarrow can then segfault.

Steps to reproduce

import pyarrow.dataset
import vortex
import pyarrow as pa

pa.array([1])

# Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

Environment

  • Vortex version 0.70.0
  • Python: 3.11
  • macOS: 26.4.1

Suspected root cause

Looks like vortex's Python extension statically links its own copy of the Arrow runtime? If imported after pyarrow.dataset, this means vortex can overwrite parts of Arrow's process-global state. Subsequent pyarrow calls then dereference function pointers or type handles that point into a runtime it's not coordinated with, and the process crashes.

Possible fix

Reuse the existing pyarrow runtime when it's already loaded?
Ref: pyarrow C++ integration docs.

Additional context

Reversing the import order lets pyarrow's registrations win, leaving the runtime self-consistent:

  • pyarrow.datasetvortex

  • vortexpyarrow.dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions