-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][schema] Illegal character '$' in record #25193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a regression introduced by the Avro upgrade (1.12.0) where Protobuf-derived Avro schemas can fail parsing due to $ appearing in generated record names, by introducing a more permissive Avro NameValidator and adding tests to cover the scenario.
Changes:
- Add a custom Avro
NameValidator(CompatibleNameValidator) to allow$in Avro record/field names during schema validation. - Add unit tests for the validator’s behavior and a reproduction test using a generated Protobuf schema.
- Introduce a new Protobuf message (
DataRecord.proto) used by the tests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java | Uses a custom Avro NameValidator to accept $ during schema parsing/validation. |
| pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java | Adds tests for the new validator and a Protobuf-based reproduction. |
| pulsar-broker/src/main/proto/DataRecord.proto | Adds a Protobuf schema used to generate Avro with nested-type $ names for testing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Schema.Parser avroSchemaParser = new Schema.Parser(COMPATIBLE_NAME_VALIDATOR); | ||
| avroSchemaParser.setValidateDefaults(false); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixes schema validation by using CompatibleNameValidator, but other broker-side Avro parsing paths (e.g. schema compatibility checks) still construct new Schema.Parser() with the default validator. If existing Protobuf-derived Avro schemas contain $ in record names, compatibility checks may still fail even though validation now passes. Consider reusing the same validator (or NameValidator.NO_VALIDATION) everywhere the broker parses stored schema JSON for PROTOBUF/AVRO/JSON-compat paths to fully address the regression.
| final char first = name.charAt(0); | ||
| if (!(Character.isLetter(first) || first == '_' || first == '$')) { | ||
| return new Result("Illegal initial character: " + name); | ||
| } | ||
| for (int i = 1; i < length; i++) { | ||
| final char c = name.charAt(i); | ||
| // we need to allow $ for the special case | ||
| if (!(Character.isLetterOrDigit(c) || c == '_' || c == '$')) { | ||
| return new Result("Illegal character in: " + name); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompatibleNameValidator currently uses Character.isLetter/isLetterOrDigit, which treats many Unicode characters (e.g. Chinese) as letters. That means names like "名字" will validate successfully, but the added tests expect them to be rejected and Avro’s name spec is ASCII-based. Please tighten the allowed character set to ASCII (e.g. [A-Za-z_$][A-Za-z0-9_$]*) so behavior matches the intended compatibility scope (allow $ but not arbitrary Unicode identifiers).
| syntax = "proto3"; | ||
|
|
||
| package pulsar.schema; | ||
| option java_package = "org.apache.pulsar.broker.service.schema.proto"; | ||
|
|
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New proto files in this module typically include the ASF license header (see e.g. pulsar-broker/src/main/proto/SchemaRegistryFormat.proto). Please add the standard license header comment block at the top of this file to match project conventions and avoid licensing/audit issues.
| import org.apache.pulsar.common.schema.SchemaType; | ||
| import org.apache.pulsar.common.util.ObjectMapperFactory; | ||
| import org.junit.jupiter.api.Assertions; | ||
| import org.testng.annotations.DataProvider; | ||
| import org.testng.annotations.Test; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test class uses TestNG annotations (org.testng.annotations.*) but the new assertions import org.junit.jupiter.api.Assertions. The broker module appears to standardize on TestNG (org.testng.Assert) and doesn’t include JUnit 5 dependencies, so this import may fail compilation. Please switch these assertions to TestNG (org.testng.Assert or static imports) and remove the JUnit Jupiter import.
| import org.apache.avro.protobuf.ProtobufData; | ||
| import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException; | ||
| import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass; | ||
| import org.apache.pulsar.client.api.Schema; | ||
| import org.apache.pulsar.client.impl.schema.ProtobufSchema; | ||
| import org.apache.pulsar.common.protocol.schema.SchemaData; | ||
| import org.apache.pulsar.common.schema.SchemaInfo; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports were added here (org.apache.avro.protobuf.ProtobufData, org.apache.pulsar.common.schema.SchemaInfo). If the build uses checkstyle/spotless, this will fail. Please remove unused imports.
| import org.apache.avro.protobuf.ProtobufData; | |
| import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException; | |
| import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass; | |
| import org.apache.pulsar.client.api.Schema; | |
| import org.apache.pulsar.client.impl.schema.ProtobufSchema; | |
| import org.apache.pulsar.common.protocol.schema.SchemaData; | |
| import org.apache.pulsar.common.schema.SchemaInfo; | |
| import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException; | |
| import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass; | |
| import org.apache.pulsar.client.api.Schema; | |
| import org.apache.pulsar.client.impl.schema.ProtobufSchema; | |
| import org.apache.pulsar.common.protocol.schema.SchemaData; |
|
|
||
| try { | ||
| Schema.Parser avroSchemaParser = new Schema.Parser(); | ||
| Schema.Parser avroSchemaParser = new Schema.Parser(COMPATIBLE_NAME_VALIDATOR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Pulsar code base includes new Schema.Parser() in many locations. Wouldn't we have to use the custom name validator in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll change them also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some factory class to pulsar-common? There's no avro dependency currently, but I guess it could justified.
Motivation
After the PR #24617. We upgraded to a newer Avro version, which introduced breaking changes for our use case.
The discussion context is here: https://lists.apache.org/thread/3xx7rx8571bbxc6c5s2ldbnmr06c42x3
This PR is only working to avoid breaking existing users. For the root cause of the issue, we should fix it in the Avro protobuf.
Modifications
CompatibleNameValidatorinstead of the default UTF_VALIDATOR to allow '$' to pass our validation.Verifying this change
Documentation
doc-not-needed