After trying out atheris, based on your example (it is awesome, I'd say!) I found an interesting bug in caching that comes from the following fact:
>>> hash(-2)
-2
>>> hash(-1)
-2
From PEP-456:
The internal interface code between the hash function and the tp_hash slots implements special cases for zero length input and a return value of -1. An input of length 0 is mapped to hash value 0. The output -1 is mapped to -2.
It leads to a problem with the wrong canonicalisation, e.g. if {'exclusiveMaximum': 1, 'exclusiveMinimum': -1, 'type': 'number'} was cached first, then applying canonicalisation on {'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'} will return 'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'} :(
-1 is quite common, and these cache collisions make me think about the current implementation - I am not completely sure how to implement caching efficiently enough. However, in #69, after reducing how many schemas are inlined, the performance improved dramatically, and I am not sure if this caching layer worth having (at least in the current implementation)
What do you think?
After trying out
atheris, based on your example (it is awesome, I'd say!) I found an interesting bug in caching that comes from the following fact:From PEP-456:
It leads to a problem with the wrong canonicalisation, e.g. if
{'exclusiveMaximum': 1, 'exclusiveMinimum': -1, 'type': 'number'}was cached first, then applying canonicalisation on{'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'}will return'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'}:(-1is quite common, and these cache collisions make me think about the current implementation - I am not completely sure how to implement caching efficiently enough. However, in #69, after reducing how many schemas are inlined, the performance improved dramatically, and I am not sure if this caching layer worth having (at least in the current implementation)What do you think?