Skip to content

Conversation

@zooo-code
Copy link

@zooo-code zooo-code commented Jan 17, 2026

Fixes #25128

Motivation

The logic for setting policies.bundles is different between getNamespacePolicies (sync) and getNamespacePoliciesAsync:

  • getNamespacePolicies (sync):

    • bundles source: NamespaceBundleFactory.getBundles().getBundlesData()
    • fallback: bundleData != null ? bundleData : policies.bundles
    • migrated fallback: false when LocalPolicies is absent
  • getNamespacePoliciesAsync:

    • bundles source: LocalPolicies.bundles
    • fallback: none (only sets if localPolicies.isPresent())
    • migrated fallback: none

This inconsistency can cause different results when calling sync vs async methods for the same namespace.

Modifications

  • Refactored getNamespacePolicies (sync) to call getNamespacePoliciesAsync and wait for the result, eliminating code duplication
  • getNamespacePoliciesAsync now uses NamespaceBundleFactory.getBundlesAsync() instead of LocalPolicies.bundles
  • Applied the same fallback logic: bundleData != null ? bundleData : policies.bundles
  • Ensured migrated field defaults to false when LocalPolicies is absent

Verifying this change

  • Make sure that the change passes the CI checks.

Added testGetNamespacePoliciesSyncAsyncBundlesConsistency test in AdminApiTest to verify that sync (getPolicies) and async (getPoliciesAsync) methods return consistent bundles data.

Existing tests only verify sync getPolicies().bundles behavior (e.g., AdminApiTest:816,822,3223, NamespacesTest:1266). There was no test that verifies async getPoliciesAsync returns the same bundles data as sync method. This new test explicitly validates the consistency between sync and async methods, which is the core fix of this PR.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/zooo-code/pulsar

}

@Test
public void testGetNamespacePoliciesSyncAsyncBundlesConsistency() throws Exception {
Copy link
Member

@lhotari lhotari Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this test fail before making the change to production code? If not, I think that it would be more useful to verify that the behavior is correct for the async method regarding the modified detail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari Thanks for the feedback!

You're right—the current test would likely pass even before the change because LocalPolicies.bundles and NamespaceBundleFactory.getBundles() often return the same value in normal scenarios.

To address this, I'll update the test to explicitly verify that the bundles are retrieved via NamespaceBundleFactory. Would mocking NamespaceBundleFactory and verifying the call to getBundlesAsync() be the preferred approach here?

Copy link
Author

@zooo-code zooo-code Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari

I've updated the test to testGetNamespacePoliciesAsyncReflectsBundleSplit.
This test explicitly verifies that getPoliciesAsync returns the latest bundles from NamespaceBundleFactory after a bundle split:

  1. Create namespace with 1 bundle
  2. Split the bundle → 2 bundles
  3. Verify getPoliciesAsync() returns 2 bundles

Before the fix: LocalPolicies.bundles would still show 1 bundle (stale data) → test fails
After the fix: NamespaceBundleFactory.getBundlesAsync() provides the latest 2 bundles → test passes

This approach verifies the actual behavior change rather than just comparing sync/async consistency, which would pass even before the fix.

@lhotari
Copy link
Member

lhotari commented Jan 19, 2026

Matching PR in forked repository

PR in forked repository: https://github.com/zooo-code/pulsar

@zooo-code the intention is that contributors would setup "Personal CI" so that they could run the Pulsar CI without dependency on apache/pulsar maintainers approving CI runs.

@zooo-code
Copy link
Author

Matching PR in forked repository

PR in forked repository: https://github.com/zooo-code/pulsar

@zooo-code the intention is that contributors would setup "Personal CI" so that they could run the Pulsar CI without dependency on apache/pulsar maintainers approving CI runs.

Thanks for the pointer! I'll enable GitHub Actions on my forked repository right away to run the CI independently.

@zooo-code zooo-code requested a review from lhotari January 19, 2026 14:21
@zooo-code zooo-code force-pushed the fix-admin-namespace-policies-async branch from 0592a10 to 7164373 Compare January 19, 2026 22:35
@zooo-code
Copy link
Author

@lhotari Could you please approve the CI run? I fixed the checkstyle violations.

@lhotari
Copy link
Member

lhotari commented Jan 20, 2026

@lhotari Could you please approve the CI run? I fixed the checkstyle violations.

@zooo-code As the first step, please enable Personal CI and complete a run in your fork by opening a PR for this same branch to your own forked repository. You will just need to enable GitHub Actions in your own repository before this. It's free. This will be helpful if you plan to make more contributions.

There might be more CI failures and it's better that you can run CI on your own. Just be aware of many flaky tests in Pulsar. You can rerun failed build steps to get pass them.

@zooo-code zooo-code force-pushed the fix-admin-namespace-policies-async branch from a46183a to 7164373 Compare January 20, 2026 09:34
@zooo-code
Copy link
Author

Hi @lhotari,

I've refactored the code so that sync calls async as you suggested. However, I'm encountering a test failure in CI.

Test failure:

Root cause:
When LocalPolicies has corrupted/empty data in the metadata store, deserialization fails. The .exceptionally() handler doesn't catch this because the exception occurs inside MetadataCacheImpl's CompletableFuture chain.

Possible solutions:

  1. Add exception handling to all getLocalPoliciesAsync() call sites (8 places)
  2. Modify LocalPoliciesResources.getLocalPoliciesAsync() to handle deserialization failures gracefully
  3. Different approach?

Could you advise on the preferred solution?

Thanks!

@zooo-code zooo-code force-pushed the fix-admin-namespace-policies-async branch 2 times, most recently from 3d6f805 to 4673a80 Compare January 24, 2026 08:54
@zooo-code
Copy link
Author

zooo-code commented Jan 24, 2026

Hi @lhotari

I've refactored the code as suggested so that the sync method now calls the async one.

However, I'm hitting a failure in AdminApiSchemaTest.createKeyValueSchema.

It's caused by LocalPolicies having empty content (0 bytes) in the metadata store, which triggers a deserialization error in the new async path:

ContentDeserializationException: Failed to deserialize payload for key '/admin/local-policies/schematest/test'
Caused by: MismatchedInputException: No content to map due to end-of-input

The refactoring exposed a pre-existing edge case. The original code likely handled data access differently, while the new async approach triggers full object deserialization, which fails on empty content.

I tried catching ContentDeserializationException in
LocalPoliciesResources.getLocalPoliciesAsync() to return Optional.empty(), but the error still propagates.

Should I handle this at the MetadataCacheImpl level (treating empty content as non-existent globally), or is there a better approach?

zooo-code added a commit to zooo-code/pulsar that referenced this pull request Jan 25, 2026
@zooo-code zooo-code force-pushed the fix-admin-namespace-policies-async branch from 4673a80 to 9977b76 Compare January 25, 2026 02:29
@zooo-code zooo-code force-pushed the fix-admin-namespace-policies-async branch from 9977b76 to 123c9c7 Compare January 25, 2026 03:00
@zooo-code
Copy link
Author

Hi @lhotari,

I've resolved the ContentDeserializationException issue triggered by empty LocalPolicies content.

Summary of Changes

  1. MetadataCacheImpl: Added a check to treat empty content as Optional.empty() to prevent deserialization errors.
  2. AdminResource & NamespaceBundleFactory: Added exception handling to gracefully fallback (using empty/global policies) if LocalPolicies are corrupted or fail to load.

Verification

Ready for review!
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Logic for setting policies.bundle is different in AdminResource#getNamespacePolicies compared to AdminResource#getNamespacePoliciesAsync

2 participants