Skip to content

Handle integer-backed Arrow decimals via logical type metadata#653

Merged
adsharma merged 2 commits into
LadybugDB:mainfrom
rahul-iyer:issue-652-arrow-logical-type-info
Jul 5, 2026
Merged

Handle integer-backed Arrow decimals via logical type metadata#653
adsharma merged 2 commits into
LadybugDB:mainfrom
rahul-iyer:issue-652-arrow-logical-type-info

Conversation

@rahul-iyer

Copy link
Copy Markdown
Contributor

Summary

ADBC exposes Arrow schemas and arrays, but some Snowflake types are not fully recoverable from the Arrow
physical type alone. In particular, Snowflake NUMBER/DECIMAL columns may arrive with their semantics
encoded in field metadata rather than standard Arrow decimal format.

The Snowflake ADBC driver documents two relevant metadata channels:

  • query-result field metadata such as logicalType=FIXED, precision, and scale
  • table-schema metadata such as DATA_TYPE=NUMBER(p,s)

Without interpreting those annotations, Ladybug can bind Snowflake decimals as plain numeric storage
types.

What changed

  • Added Snowflake-specific decoding for decimal types from:

    • logicalType=FIXED metadata on Arrow fields
    • DATA_TYPE=NUMBER(...), NUMERIC(...), and DECIMAL(...) metadata on Arrow fields
  • Added decimal scanning support for Snowflake FIXED values when the Arrow batch is:

    • integer-backed
    • float32-backed
    • float64-backed
  • Refactored Arrow metadata handling into separate decoder components:

    • shared metadata utilities
    • Snowflake decoder
    • generic metadata decoder
    • a single orchestrator entrypoint

Design

The refactor separates three concerns:

  • Arrow physical type parsing
  • logical type recovery from metadata
  • vendor-specific metadata decoding

The rest of the Arrow pipeline still consumes normalized logical type information only. This keeps
Snowflake-specific behavior out of the core binding and scan code paths except where the recovered
logical type is applied.

This structure is intended to make future support for other sources, such as Databricks/Spark-specific
metadata, straightforward to add as separate decoders rather than as scattered conditionals.

https://arrow.apache.org/adbc/current/driver/snowflake.html

Snowflake signal Example metadata Arrow physical storage Ladybug result Notes
Raw Snowflake decimal type via DATA_TYPE DATA_TYPE=NUMBER(12,4) Any Arrow storage type DECIMAL(12,4) Parsed from Snowflake table-schema metadata.
Raw Snowflake decimal type via DATA_TYPE with implicit scale DATA_TYPE=NUMBER(18) Any Arrow storage type DECIMAL(18,0) Missing scale defaults to 0.
Raw Snowflake decimal aliases via DATA_TYPE DATA_TYPE=NUMERIC(10,3) or DATA_TYPE=DECIMAL(10,3) Any Arrow storage type DECIMAL(10,3) Matching is case-insensitive and whitespace-tolerant.
Snowflake logical decimal metadata logicalType=FIXED, precision=7, scale=2 Integer-backed Arrow (INT8/16/32/64, UINT8/16/32/64) DECIMAL(7,2) Used for query-result Arrow schemas.
Snowflake logical decimal metadata logicalType=FIXED, precision=9, scale=2 Float-backed Arrow (FLOAT, DOUBLE) DECIMAL(9,2) Values are cast into decimal backing storage during scan.
Snowflake raw type fallback to logical metadata malformed DATA_TYPE plus valid logicalType=FIXED metadata Integer-backed or float-backed Arrow DECIMAL(p,s) from logicalType metadata If raw DATA_TYPE parsing fails, Snowflake logicalType parsing is tried next.
Snowflake metadata precedence over generic metadata DATA_TYPE=NUMBER(12,4) plus generic logicalType=DECIMAL, precision=9, scale=3 Any Arrow storage type DECIMAL(12,4) Snowflake raw type metadata wins over generic metadata.

@rahul-iyer rahul-iyer marked this pull request as draft July 4, 2026 10:01
Comment thread src/include/common/arrow/arrow_schema_metadata.h Outdated
@rahul-iyer rahul-iyer marked this pull request as ready for review July 4, 2026 18:12
@adsharma

adsharma commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Snowflake returns decimals and integers as NUMBER(p,s) instead of native Apache Arrow types because Snowflake does not have separate storage types for integers and decimals; it consolidates all fixed-point numbers under a single underlying NUMBER implementation.

When Snowflake transmits metadata to Arrow-based tools or drivers (like ADBC, JDBC, or Power BI), it identifies all fixed-point data as a generic "fixed" type with a precision of up to 38. Because native Apache Arrow strictly distinguishes between fixed-width integers (like int64) and exact decimals (decimal128), this architectural design causes several specific compatibility behaviors:

Set the client option adbc.snowflake.sql.client_option.use_high_precision to false. This tells the driver to map any NUMBER with a scale of 0 (s=0) directly to a standard Arrow int64 column.

@rahul-iyer - could you add the first two paragraphs to the summary as the motivation and the third paragraph as a potential workaround for older releases without this fix?

@adsharma adsharma force-pushed the issue-652-arrow-logical-type-info branch from 4c5a710 to 9c7a47f Compare July 5, 2026 16:25
@adsharma adsharma merged commit 5261ba2 into LadybugDB:main Jul 5, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants