Skip to content

Conversation

@mattisonchao
Copy link
Member

Motivation

After the PR #24617. We upgraded to a newer Avro version, which introduced breaking changes for our use case.

The discussion context is here: https://lists.apache.org/thread/3xx7rx8571bbxc6c5s2ldbnmr06c42x3

This PR is only working to avoid breaking existing users. For the root cause of the issue, we should fix it in the Avro protobuf.

Modifications

  • Support a CompatibleNameValidator instead of the default UTF_VALIDATOR to allow '$' to pass our validation.
  • Add a test to reproduce the issue.
  • Add a test for CompatibleNameValidator

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc-not-needed

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a regression introduced by the Avro upgrade (1.12.0) where Protobuf-derived Avro schemas can fail parsing due to $ appearing in generated record names, by introducing a more permissive Avro NameValidator and adding tests to cover the scenario.

Changes:

  • Add a custom Avro NameValidator (CompatibleNameValidator) to allow $ in Avro record/field names during schema validation.
  • Add unit tests for the validator’s behavior and a reproduction test using a generated Protobuf schema.
  • Introduce a new Protobuf message (DataRecord.proto) used by the tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java Uses a custom Avro NameValidator to accept $ during schema parsing/validation.
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java Adds tests for the new validator and a Protobuf-based reproduction.
pulsar-broker/src/main/proto/DataRecord.proto Adds a Protobuf schema used to generate Avro with nested-type $ names for testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +55 to 56
Schema.Parser avroSchemaParser = new Schema.Parser(COMPATIBLE_NAME_VALIDATOR);
avroSchemaParser.setValidateDefaults(false);
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes schema validation by using CompatibleNameValidator, but other broker-side Avro parsing paths (e.g. schema compatibility checks) still construct new Schema.Parser() with the default validator. If existing Protobuf-derived Avro schemas contain $ in record names, compatibility checks may still fail even though validation now passes. Consider reusing the same validator (or NameValidator.NO_VALIDATION) everywhere the broker parses stored schema JSON for PROTOBUF/AVRO/JSON-compat paths to fully address the regression.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +123
final char first = name.charAt(0);
if (!(Character.isLetter(first) || first == '_' || first == '$')) {
return new Result("Illegal initial character: " + name);
}
for (int i = 1; i < length; i++) {
final char c = name.charAt(i);
// we need to allow $ for the special case
if (!(Character.isLetterOrDigit(c) || c == '_' || c == '$')) {
return new Result("Illegal character in: " + name);
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompatibleNameValidator currently uses Character.isLetter/isLetterOrDigit, which treats many Unicode characters (e.g. Chinese) as letters. That means names like "名字" will validate successfully, but the added tests expect them to be rejected and Avro’s name spec is ASCII-based. Please tighten the allowed character set to ASCII (e.g. [A-Za-z_$][A-Za-z0-9_$]*) so behavior matches the intended compatibility scope (allow $ but not arbitrary Unicode identifiers).

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
syntax = "proto3";

package pulsar.schema;
option java_package = "org.apache.pulsar.broker.service.schema.proto";

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New proto files in this module typically include the ASF license header (see e.g. pulsar-broker/src/main/proto/SchemaRegistryFormat.proto). Please add the standard license header comment block at the top of this file to match project conventions and avoid licensing/audit issues.

Copilot uses AI. Check for mistakes.
Comment on lines 33 to 37
import org.apache.pulsar.common.schema.SchemaType;
import org.apache.pulsar.common.util.ObjectMapperFactory;
import org.junit.jupiter.api.Assertions;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class uses TestNG annotations (org.testng.annotations.*) but the new assertions import org.junit.jupiter.api.Assertions. The broker module appears to standardize on TestNG (org.testng.Assert) and doesn’t include JUnit 5 dependencies, so this import may fail compilation. Please switch these assertions to TestNG (org.testng.Assert or static imports) and remove the JUnit Jupiter import.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +32
import org.apache.avro.protobuf.ProtobufData;
import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass;
import org.apache.pulsar.client.api.Schema;
import org.apache.pulsar.client.impl.schema.ProtobufSchema;
import org.apache.pulsar.common.protocol.schema.SchemaData;
import org.apache.pulsar.common.schema.SchemaInfo;
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports were added here (org.apache.avro.protobuf.ProtobufData, org.apache.pulsar.common.schema.SchemaInfo). If the build uses checkstyle/spotless, this will fail. Please remove unused imports.

Suggested change
import org.apache.avro.protobuf.ProtobufData;
import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass;
import org.apache.pulsar.client.api.Schema;
import org.apache.pulsar.client.impl.schema.ProtobufSchema;
import org.apache.pulsar.common.protocol.schema.SchemaData;
import org.apache.pulsar.common.schema.SchemaInfo;
import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass;
import org.apache.pulsar.client.api.Schema;
import org.apache.pulsar.client.impl.schema.ProtobufSchema;
import org.apache.pulsar.common.protocol.schema.SchemaData;

Copilot uses AI. Check for mistakes.

try {
Schema.Parser avroSchemaParser = new Schema.Parser();
Schema.Parser avroSchemaParser = new Schema.Parser(COMPATIBLE_NAME_VALIDATOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pulsar code base includes new Schema.Parser() in many locations. Wouldn't we have to use the custom name validator in all cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change them also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some factory class to pulsar-common? There's no avro dependency currently, but I guess it could justified.

@mattisonchao mattisonchao marked this pull request as draft January 30, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants