Skip to content

[C++][Gandiva] Fix castVARCHAR memory inefficiencies, unused bool allocation, and missing len<=0 handling #49420

@dmitry-chirkov-dremio

Description

@dmitry-chirkov-dremio

Describe the enhancement requested

The castVARCHAR functions have several issues:

Functional issues:

  • bool - allocates 5 bytes from arena, then immediately overwrites the pointer with a string literal; the allocation is never used
  • int32/int64 - missing handling for len=0 (should return empty string) and len<0 (should set error)

Performance issues:

  • int32/int64 - max output is 11/20 bytes, but allocates len bytes upfront
  • date64 - output is always 10 bytes ("YYYY-MM-DD"), but allocates len bytes
  • float32/float64 - max output is ~15-24 bytes, but allocates len bytes

Proposed fixes:

  1. Integer types: Format to stack buffer, allocate only min(len, actual_size) bytes, add len<=0 handling
  2. Date/Float types: Allocate only min(len, max_output_size) bytes upfront instead of len bytes
  3. Boolean type: Return string literal directly without arena allocation
  4. Tests: Add coverage for len=0 and len<0 edge cases

Component(s)

C++, Gandiva

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions