Skip to content

fix(ustring): protect against idempotent rehash upon collision#5090

Open
lgritz wants to merge 1 commit intoAcademySoftwareFoundation:mainfrom
lgritz:lg-hash0
Open

fix(ustring): protect against idempotent rehash upon collision#5090
lgritz wants to merge 1 commit intoAcademySoftwareFoundation:mainfrom
lgritz:lg-hash0

Conversation

@lgritz
Copy link
Collaborator

@lgritz lgritz commented Mar 14, 2026

Some fuzzing stumbled across a very interesting edge case in the rehashing logic of ustring internals:

If the initial hash location of a newly ustring-ized string is already occupied, it rehashes. But we never considered what happens if it ends up rehashing to the same slot! And in fact, that can happen if the true hash is 0 (I'm not sure if it can/does for any other value). And we may have thought that the only string that would ever hash to 0 is the empty string (which is 0 hash by design, and doesn't need to rehash because the table is seeded with that entry already in its correct slot).

So anyway, if there was some other string that happened to hash to 0, it would get caught in an infinite loop of rehashing but never finding a free slot, because it keeps rehashing to the very same occupied slot.

And would you believe that there IS such a string, and it is this one:

"\t\n\n\t\301\350`O\217c85?\202\200\251r|}\304\351\2517\337K'\217C\240"

(unprintable chars expressed as escaped octal values)

The crux of this patch is to notice when the old and new hash bit pattern has not changed, and bump it by 7 (a prime number, so it should not land on the same location again until every slot in the bin has been exhausted).

Solution entirely from my own human brain, in case you're wondering.

Some fuzzing stumbled across a very interesting edge case in the
rehashing logic of ustring internals:

If the initial hash location of a newly ustring-ized string is already
occupied, it rehashes. But we never considered what happens if it ends
up rehashing to the same slot! And in fact, that can happen if the
true hash is 0 (I'm not sure if it can/does for any other value).  And
we may have thought that the only string that would ever hash to 0 is
the empty string (which is 0 hash by design, and doesn't need to
rehash because the table is seeded with that entry already in its
correct slot).

So anyway, if there was some other string that happened to hash to 0,
it would get caught in an infinite loop of rehashing but never finding
a free slot, because it keeps rehashing to the very same slot.

And would you believe that there IS such a string, and it is this one:

    "\t\n\n\t\301\350`O\217c85?\202\200\251r|}\304\351\2517\337K'\217C\240"

(unprintable chars expressed as escaped octal values)

The crux of this patch is to notice when the old and new hash bit pattern
has not changed, and bump it by 7 (a prime number, so it should not
land on the same location again until every slot in the bin has been
exhausted).

Solution entirely from my own human brain, in case you're wondering.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
@etheory
Copy link
Contributor

etheory commented Mar 15, 2026

Didn't we have a discussion ages ago about hash collisions and that since they "don't happen" then they don't need to be handled and I insisted that they will happen and that we need to handle them? Apologies, as I could just read the code, but are hash collisions explicitly handled for ustring, or are they assumed not to happen and that's why the above happens? Thanks.

@lgritz
Copy link
Collaborator Author

lgritz commented Mar 15, 2026

Remember that the ustring holds the address of the string, which is always unique but not consistent from run to run. But its hash is stable but not unique (though collisions are very very rare). To play nicely on the GPU -- in particular to not use addresses, which might be different every run and thus hurt the PTX compilation cache -- we changed OSL to mostly use the string hashes, so there is some danger of a collision.

There are two hashing issues:

(1) What happens if two strings have the same hash, how do you put them in the ustring table?

Collisions in the table are "ok" in the sense that ordinarily they are handled just fine, by rehashing to find a free slot in the table.

Except that this PR here addresses a previously unanticipated edge case, which is a hash whose rehash is itself, so it could not find an alternative slot.

(2) If two strings have the same hash, and OSL is dealing with the hashes and not the strings, is there a danger of a shader behaving incorrectly in the case of a collision? Here we still have an outstanding problem.

So to address this second problem, we had an idea that the non-colliding hashes would use the original unique hash, and the colliding hashes would use the string address. We sacrifice one bit in the hash to distinguish the two cases, so that the purely hashed values could never be the same as the address values that we use as hashes after collisions. So they would truly be unique, but at the cost of causing a PTX cache miss for the JITed coder, in this incredibly rare case.

I wrote a PR to implement that scheme, which I have embarrassingly let sit idle (my excuse: there was some debate about whether to use the high or low bits for this purpose, and I never got around to returning to it to rewrite it the other way). I need to revive that and finish it once and for all. Will do that as soon as the present one is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants