pl-row-encode

A Polars plugin for row-level, type-preserving encode/decode.

encode(*cols) packs a set of columns into a single Binary column where each value is an opaque, self-describing token: the polars-row encoding of the row, prefixed with an embedded schema header. decode_series(...) reverses it back into a Struct, recovering the original dtypes without needing any external schema.

DataFrame
  -> encode(*cols)
  -> opaque bytes
  -> decode(...)   # (row bytes -> Struct -> original typed columns)
  -> DataFrame

The type information rides with the token and can be decoded on the spot at some later date.

Token layout

Each Binary value is:

[ u32 header_len (LE) ][ header bytes ][ row bytes ]

header is a bincode-serialized Vec<Field> (logical schema); row bytes is the unordered polars-row encoding of that single row. Embedding the header per value makes every token independently decodable.

Usage

import polars as pl
from pl_row_encode import encode, decode_series

df = pl.DataFrame({"id": [1, 2], "name": ["alice", "bob"]})

tokens = df.select(tok=encode("id", "name"))["tok"]   # dtype: Binary
# ... hand `tokens` to a vendor, get them back ...

decoded = decode_series(tokens).struct.unnest()        # back to id / name with dtypes

For the lazy engine, the output Struct dtype must be known up front, so pass a token's header explicitly:

from pl_row_encode import decode
header = ...  # the [u32 len][header] prefix of any token
lf.select(decode("tok", schema_header=header)).collect()

Development

make develop   # build the Rust extension into the venv (uv run maturin develop)
make test      # build + run pytest
make lint      # ruff + ty

The first make develop compiles the full Polars Rust workspace and takes a few minutes; subsequent builds are incremental and fast.

Notes / limitations

Built on polars-row, the same machinery Polars uses internally for sort/group-by row encoding — lossless for primitive, string, boolean, temporal, and nested types.
decode_series infers the schema from the first non-null token, so an all-null/empty Series needs the explicit decode(schema_header=...) form.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pl_row_encode		pl_row_encode
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pl-row-encode

Token layout

Usage

Development

Notes / limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pl-row-encode

Token layout

Usage

Development

Notes / limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages