Skip to content

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jan 10, 2026

This PR specializes FOR_ITER for iterating dct.items(). Furthermore, in the JIT, it goes one step further by avoiding allocation/reuse/extra writes to the pair tuple entirely. Choosing to scalar replace the tuple on the stack instead.

According to stats, dict items makes up 33% of specialization failures.

Roughly 33% reduction in execution time for iteration through dictionary items in the JIT on microbenchmark:

def testfunc(dct):
    for k, v in dct.items():
        pass

n = 10000000
dct = dict(zip(range(n), range(n)))

import sys
import time
start = time.time()
for _ in range(20):
    testfunc(dct)
end = time.time()
print(end - start)
JIT on on both
Main: 3.54s
This branch: 2.34s

@Fidget-Spinner Fidget-Spinner changed the title Specialize FOR_ITER for dict.items(), and scalar replace the pair in the JIT gh-143667: Specialize FOR_ITER for dict.items(), and scalar replace the pair in the JIT Jan 10, 2026
Copy link
Contributor

@eendebakpt eendebakpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Would the same approach work for enumerate?

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jan 10, 2026

Nice work! Would the same approach work for enumerate?

Yes. I'm hesitant to do it for that though. For enumerate, I'd rather we write a Python equivalent for the JIT and let it specialize on the generators.

@Fidget-Spinner
Copy link
Member Author

@eendebakpt sorry I can't reproduce the speedup on deepcopy anymore. Deepcopy is too unstable on my machine so I have no trust in the results. From what I can see, it's roughly the same speed.

@markshannon
Copy link
Member

I don't think we should do this for 2 reasons. I think the optimization is sound, but I'm opposed to the mechanism.

  1. We are getting low on spare opcodes, and while this might be a justifiable use of one of those spares, we can do the same optimization in the JIT provided we have the type information provided by the more generic approach in Broader specialization in the Specializing Adaptive Interpreter for better JIT performance #143732
  2. The new code in optimizer_bytecodes.c is fragile, as it depends on the exact layout of uops. Such optimizations belong in the virtualization pass. In a virtualization pass, _ITER_NEXT_DICT_ITEMS would be unconditionally replaced with _ITER_NEXT_DICT_ITEMS_UNPACK, pushing a virtual pair to the (shadow) stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants