Skip to content

perf: Optimise map_rows performance for Object dtype#25702

Open
alexander-beedie wants to merge 1 commit intopola-rs:mainfrom
alexander-beedie:map-rows-perf
Open

perf: Optimise map_rows performance for Object dtype#25702
alexander-beedie wants to merge 1 commit intopola-rs:mainfrom
alexander-beedie:map-rows-perf

Conversation

@alexander-beedie
Copy link
Copy Markdown
Collaborator

@alexander-beedie alexander-beedie commented Dec 9, 2025

Ref: #25688.

Added a dedicated Object path in map_rows that gets us a 50-60% speedup (on the given test-case). I think I can see a few opportunities to get a smaller speed-up for scalar/primitive types too, but will leave that for a separate PR.

Benchmark

import polars as pl
df = pl.DataFrame(
  data={
    "a": [1,2,3] * 1_000_000, 
    "b": [1,2,3] * 1_000_000,
  },
  schema={"a": pl.Int64, "b": pl.Int64},
)
%timeit df.map_rows(lambda d: (d[0] + d[1]), return_dtype=pl.Object)

Timings1 🚀

Before: 306 ms ± 2.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 After: 194 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Footnotes

  1. Tested using make build-dist-release on an Apple Silicon M4 Max

@github-actions github-actions Bot added A-dtype-object Area: object data type performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Dec 9, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 52.08333% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.54%. Comparing base (405b194) to head (18e87bf).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-python/src/dataframe/map.rs 52.08% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #25702      +/-   ##
==========================================
- Coverage   80.55%   80.54%   -0.01%     
==========================================
  Files        1756     1757       +1     
  Lines      241908   242143     +235     
  Branches     3040     3040              
==========================================
+ Hits       194869   195036     +167     
- Misses      46256    46325      +69     
+ Partials      783      782       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orlp
Copy link
Copy Markdown
Member

orlp commented Dec 9, 2025

I literally just removed all the code that special-cases all this stuff because it all wasn't being updated and was buggy :(

Can we not make the generic path faster by profiling/optimizing AnyValue instead of immediately adding special-cases back?

@orlp orlp added the do not merge This pull requests should not be merged right now label Dec 9, 2025
@alexander-beedie
Copy link
Copy Markdown
Collaborator Author

I literally just removed all the code that special-cases all this stuff because it all wasn't being updated and was buggy :(

Not sure how I'm supposed to know that code I can't see used to be there 🤷

Can we not make the generic path faster by profiling/optimizing AnyValue instead of immediately adding special-cases back?

I'll take a look tomorrow.

@orlp
Copy link
Copy Markdown
Member

orlp commented Dec 11, 2025

Not sure how I'm supposed to know that code I can't see used to be there 🤷

You're not, I'm not blaming you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-dtype-object Area: object data type do not merge This pull requests should not be merged right now performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants