Summary
Extend ArkoudaArray._arith_method to explicitly handle pd.NA
operands.
Currently, when an arithmetic operation involves pd.NA (e.g.,
array + pd.NA), _arith_method returns NotImplemented. pandas then
falls back to a slower path, often materializing the Arkouda-backed
array as a NumPy array, leading to:
- Loss of dtype preservation
- Performance degradation
- Potential semantic inconsistencies
This ticket ensures proper missing-value propagation for arithmetic
operations while keeping computation server-side.
Background / Problem
pandas dispatches arithmetic operations on ExtensionArrays through
_arith_method.
Example:
s = pd.Series([1, 2, 3], dtype="ak")
s + pd.NA
Current behavior: - _arith_method sees pd.NA - Returns
NotImplemented - pandas falls back to NumPy/object logic - Arkouda
array may be converted to NumPy - Performance and dtype guarantees are
lost
Desired behavior: - Arithmetic involving pd.NA should remain within
Arkouda-backed logic - Missing values should propagate according to
pandas semantics
Expected pandas Semantics
Arithmetic with pd.NA generally results in missing values propagating:
x + pd.NA → all results missing
pd.NA + x → all results missing
array_with_missing + pd.NA → all missing
array + scalar where array contains missing → missing preserved
elementwise
Behavior should match pandas nullable dtypes.
Scope
In Scope
- Modify
_arith_method to detect pd.NA
- Implement correct missing-value propagation
- Preserve Arkouda dtype and mask representation
- Avoid NumPy fallback
- Add regression tests
Out of Scope
- Redesigning arithmetic dispatch
- Changing existing non-NA arithmetic logic
- Supporting arbitrary third-party NA-like objects
Summary
Extend
ArkoudaArray._arith_methodto explicitly handlepd.NAoperands.
Currently, when an arithmetic operation involves
pd.NA(e.g.,array + pd.NA),_arith_methodreturnsNotImplemented. pandas thenfalls back to a slower path, often materializing the Arkouda-backed
array as a NumPy array, leading to:
This ticket ensures proper missing-value propagation for arithmetic
operations while keeping computation server-side.
Background / Problem
pandas dispatches arithmetic operations on ExtensionArrays through
_arith_method.Example:
Current behavior: -
_arith_methodseespd.NA- ReturnsNotImplemented- pandas falls back to NumPy/object logic - Arkoudaarray may be converted to NumPy - Performance and dtype guarantees are
lost
Desired behavior: - Arithmetic involving
pd.NAshould remain withinArkouda-backed logic - Missing values should propagate according to
pandas semantics
Expected pandas Semantics
Arithmetic with
pd.NAgenerally results in missing values propagating:x + pd.NA→ all results missingpd.NA + x→ all results missingarray_with_missing + pd.NA→ all missingarray + scalarwhere array contains missing → missing preservedelementwise
Behavior should match pandas nullable dtypes.
Scope
In Scope
_arith_methodto detectpd.NAOut of Scope