Commit 992d932
[SPARK-56167][PS] Align astype with pandas 3 default string behavior
### What changes were proposed in this pull request?
This PR updates a few pandas-on-Spark `astype` paths to match pandas 3 behavior for the default string dtype.
In pandas 3, `astype(str)` returns the default string dtype and preserves missing values instead of converting them to string literals such as `"NaN"` or `"<NA>"`. pandas-on-Spark still used the older behavior in a few localized conversion paths, including numeric, null, string, and boolean casts.
This PR makes three small changes in `python/pyspark/pandas/data_type_ops/`:
- update the shared string cast helper so `astype(str)` preserves missing values for pandas 3 string results
- align boolean-to-string casting with the same pandas 3 behavior, including the nullable metadata on the result field
- align string-to-bool casting for pandas 3 string-backed data with pandas' current `astype(bool)` result
### Why are the changes needed?
Without this change, several pandas-on-Spark `astype` tests fail with pandas 3 because some conversion paths still follow the older string-casting behavior.
The failures came from two related mismatches:
- `astype(str)` converted missing values into string literals instead of preserving them as missing values
- some follow-up casts from pandas 3 string-backed data did not match pandas' current behavior
This patch fixes those localized mismatches while keeping the pandas 2 behavior unchanged.
### Does this PR introduce _any_ user-facing change?
Yes.
For pandas 3 users, pandas-on-Spark `astype(str)` now preserves missing values in the affected paths instead of converting them to string literals. This also fixes related behavior for boolean and string-backed casts that depend on pandas 3's default string behavior.
### How was this patch tested?
The existing tests should pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Codex (GPT-5)
Closes #54968 from ueshin/issues/SPARK-56167/astype.
Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>1 parent 005f2a3 commit 992d932
3 files changed
Lines changed: 14 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| |||
193 | 194 | | |
194 | 195 | | |
195 | 196 | | |
196 | | - | |
| 197 | + | |
197 | 198 | | |
198 | 199 | | |
199 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
326 | 327 | | |
327 | 328 | | |
328 | 329 | | |
329 | | - | |
| 330 | + | |
330 | 331 | | |
331 | 332 | | |
332 | 333 | | |
333 | 334 | | |
334 | | - | |
| 335 | + | |
335 | 336 | | |
336 | 337 | | |
337 | 338 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
37 | 41 | | |
38 | 42 | | |
39 | 43 | | |
| |||
128 | 132 | | |
129 | 133 | | |
130 | 134 | | |
131 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
132 | 139 | | |
133 | 140 | | |
134 | 141 | | |
| |||
0 commit comments