-
Notifications
You must be signed in to change notification settings - Fork 358
fix(java): fix codegen config hash when register serializers #3122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| private static final boolean isLittleEndian = ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN; | ||
| private static final byte BITMAP = isLittleEndian ? isLittleEndianFlag : 0; | ||
| private static final short MAGIC_NUMBER = 0x62D4; | ||
| private static final AtomicInteger runtimeHashCounter = new AtomicInteger(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this, we could update configHash instance field direclty when invoke registerSerializer everytime. Andu compute new hash based hash of full classname of registered Serializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chaokunyang
I think the intent was to keep with having the generated class suffix be a relatively small integer.
If going for a full hash of the registered serializers, then perhaps we implement hashCode() for Fory to include Config.hashCode()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix don't use hash, it use hash as key to store a counter for suffix.
And we should not impl hash code for Fory. Maybe we should not name this method as hash either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's why we were following that pattern, but moved it to Fory to cover it being not just the initial config.
Now the incrementAndGet in the ctor and when modifying means that e.g. Fory#1 will get 1, and Fory#2 will get 2, but then Fory#2 gets a serializer added so it becomes 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, does this cover pooled Fory? Does that create multiple instances, so should still use the same trick, so we should use the Map in Fory but include the serializer hashes for computeIfAbsent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pooled Fory will apply register callback, which will invoke registerSerializer API on every pooled Fory instance. All fory instance will compute to same hash value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So we cannot do what has been added on line 145. This approach is wrong.
You are agreeing with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could update configHash instance field direclty when invoke registerSerializer everytime. And compute new hash based hash of full classname of registered Serializer
This will work, why don't you use this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fory will never be invoked By multiple threads, we never need to use any concurrent data structure here. Please remove AtomicInteger here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the implementation to compute the hash based on registered serializers instead of using a counter. Now configHash is calculated from config.hashCode() plus the hash of all registered serializer class names. This way, pooled Fury instances with the same serializers will share the same hash, while instances with different serializers get different hashes.
…registration and update test
f19a744 to
23d71c7
Compare
| private final IdentityMap<Object, Object> originToCopyMap; | ||
| private int classDefEndOffset; | ||
| private int configHash; | ||
| private final Set<String> registeredSerializers = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this, we don't need to store registered serializer
| return configHash; | ||
| } | ||
|
|
||
| private void invalidateCodegenCache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also delete this method
| registeredSerializers.add(type.getName() + ":" + serializerClass.getName()); | ||
| invalidateCodegenCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| registeredSerializers.add(type.getName() + ":" + serializerClass.getName()); | |
| invalidateCodegenCache(); | |
| this.configHash = Objects.hashCode(configHash, type, serializerClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And org.apache.fory.Fory#registerSerializer(java.lang.Class<?>, org.apache.fory.serializer.Serializer<?>) also needs updates
Why?
What does this PR do?
Related issues
Does this PR introduce any user-facing change?
Benchmark