From 5dfde69fa995e3b1e2a0b235dc432a3b77caa723 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 11:41:59 +0300 Subject: [PATCH 01/39] [improve][misc] Add AGENTS.md and update copilot-instructions.md --- .github/copilot-instructions.md | 21 ++++- AGENTS.md | 148 ++++++++++++++++++++++++++++++++ CLAUDE.md | 1 + README.md | 4 + 4 files changed, 172 insertions(+), 2 deletions(-) create mode 100644 AGENTS.md create mode 100644 CLAUDE.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 54d832eefd0f7..79e538f21d836 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -79,11 +79,17 @@ When introducing a new dependency: # Logging Guidelines -* Use **SLF4J** for logging. +* Prefer the **[slog](https://github.com/merlimat/slog)** library (`io.github.merlimat.slog`) for logging. +* Instantiate the logger with Lombok's **`@CustomLog`** annotation, which is configured in `lombok.config` to create an `io.github.merlimat.slog.Logger` (`io.github.merlimat.slog.Logger.get(TYPE)`). +* **SLF4J** is still available but is **considered deprecated** for Pulsar logging. Do not introduce new SLF4J loggers; flag new SLF4J usage in review and suggest `@CustomLog` (slog) instead. * Do **not use** `System.out` or `System.err`. * Assume production commonly runs at **INFO** log level. +* **Default new log statements to `TRACE` or `DEBUG`, not `INFO`.** Existing Pulsar code overuses `INFO`, which produces excessively noisy production logs. Use `TRACE` for fine-grained/low-level detail, `DEBUG` for diagnostics that could reasonably be enabled in production without flooding the logs, and reserve `INFO` for low-frequency, significant lifecycle/state-change events. Flag new `INFO`-level logging that would be better at `DEBUG`/`TRACE`. * Avoid excessive logging in hot paths. -* Guard expensive `DEBUG` and `TRACE` logging with `log.isDebugEnabled()` or `log.isTraceEnabled()`. +* With slog, attach data as **structured attributes** (`log.info().attr("topic", topic).log("Published")`) instead of interpolating it into the message string. +* For expensive `DEBUG`/`TRACE` values, do not guard with `log.isDebugEnabled()`/`log.isTraceEnabled()`. Instead use slog's lazy evaluation, which only runs when the level is enabled: + * pass a supplier lambda as the attribute value — `log.debug().attr("dump", () -> expensiveDump()).log("...")`, or + * use the deferred event form — `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. * Avoid logging stack traces at `INFO` and lower levels. Log messages should be: @@ -116,6 +122,17 @@ Avoid leaks for: For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an external API requires `ByteBuffer`. +## Resource Cleanup in Tests + +Tests must also close/release everything they allocate — shut down executors/clients/services and release +Netty `ByteBuf`s (and other reference-counted buffers). The test harness runs leak detectors +(`ThreadLeakDetectorListener`, Netty pooled-allocator/`ByteBuf` leak detection enabled via +`-Dpulsar.allocator.pooled=true`) that fail the build on thread or buffer leaks. + +When a leak is reported by a test, treat it as a **likely bug in the production code** (a missing close or +`release()` in the code under test), not merely test noise. Fix the root cause rather than working around it +in the test or disabling the detector. + --- # Configuration Guidelines diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000000..26d2da9e11642 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,148 @@ +# AGENTS.md + +This file provides guidance to AI coding agents working with code in this repository. + +Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, networking). Prioritize correctness, thread safety, performance, maintainability, and backward compatibility. + +## Build system + +The build is **Gradle** (migrated from Maven via PIP-463; much existing tooling and docs elsewhere still reference Maven). Use the wrapper `./gradlew` — no separate Gradle install needed. **JDK 21 or 25 is required to run the build** (`-PskipJavaVersionCheck` bypasses the check). Bytecode targets Java 17 (`options.release = 17`). + +Key build infrastructure: +- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, higher tiers build on lower ones). +- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. This is where shared compile/test/dependency config lives — edit conventions here rather than duplicating config across modules. +- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; `libs.*` references in build scripts). +- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every module. + +When changing the build, **follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)**. The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and **configure-on-demand** (`org.gradle.configureondemand=true`), so any task used by a commonly-run task (`assemble`, `test`, `integrationTest`, `rat`/`spotlessCheck`/`checkstyle*`, `checkBinaryLicense`, `docker*`, etc.) **must be configuration-cache and configure-on-demand compatible** — no reading of mutable state at execution time, no `Project` access in task actions, use `Provider`/value sources instead. Verify with `--configuration-cache`. Tooling/one-off tasks that are not part of these common flows (e.g. `verifyTestGroups`, dependency-report and ad-hoc maintenance tasks) may be exempt. + +### Module name vs. directory name gotcha + +Several Gradle project paths do **not** match their directory because the Maven artifactId is preserved. Most importantly: +- Directory `pulsar-client/` → project **`:pulsar-client-original`** +- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** +- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` + +Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. Check `settings.gradle.kts` when a path is ambiguous. + +## Common commands + +```bash +# Compile and assemble everything (or a single module) +./gradlew assemble +./gradlew :pulsar-broker:assemble + +# Lint / verify (license headers, formatting, checkstyle) — run before pushing +./gradlew rat spotlessCheck checkstyleMain checkstyleTest +./gradlew spotlessApply # auto-fix license headers/formatting + +# Verify bundled-dependency LICENSE/NOTICE coverage (run after changing a runtime dependency) +./gradlew checkBinaryLicense + +# Always scope test runs with --tests — running a whole module's test task is slow. +# Run a single test class +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" +# Run a single test method +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest." +# Run all tests in a specific package +./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.admin.*" +``` + +### Test groups (TestNG) + +Tests use **TestNG** and are tagged with `@Test(groups = "...")`. By default the build **excludes the `quarantine` and `flaky` groups** (`excludedTestGroups` default = `quarantine,flaky`), so to run a single test that lives in one of those groups you must clear the exclusion: + +```bash +# Run a specific test that is in the flaky/quarantine group (otherwise excluded by default) +./gradlew :pulsar-broker:test -PexcludedTestGroups='' --tests "" +``` + +CI selects whole groups with `-PtestGroups=` and `-PexcludedTestGroups=` (e.g. `broker,broker-admin`); locally prefer `--tests` to scope to specific classes instead of running an entire group. CI splits `pulsar-broker` tests into groups (see `pulsar-build/run_unit_group_gradle.sh` and `gradle/verify-test-groups.gradle.kts`). Tests with no group are treated as `other` at runtime. `./gradlew verifyTestGroups` reports group assignments and flags tests not covered by any CI group. + +Other test-related properties: `-PtestJavaVersion=17` (run tests on a different JDK toolchain), `-PtestRetryCount=N`, `-PtestFailFast=true|false`, `-PprotobufVersion=4.31.1` (protobuf v4 compatibility tests). + +Failed tests are retried once by default (`testRetryCount=1`; `0` when running inside the IDE). When running tests locally, prefer **`-PtestRetryCount=0`** to catch failures (including flakiness) early instead of having retries mask them. + +### Integration tests + +Integration tests live in `tests/` (see `tests/README.md`). They use [Testcontainers](https://www.testcontainers.org/) to bring up Pulsar services in Docker, so **Docker must be installed and running**. Build the test image first, then run the tests. + +The full integration suite is heavy and slow. **In local development, always run individual integration tests** rather than the whole suite — pass `--tests` to select a class (TestNG then discovers it directly from the classpath): + +```bash +./gradlew :tests:latest-version-image:dockerBuild # build the docker test image +./gradlew :tests:integration:integrationTest --tests "org.apache.pulsar.tests.integration." +``` + +To run the **entire** integration test set, use **Personal CI** (see below) rather than running it locally. (`integrationTest` also accepts `-PtestGroups`/`-PexcludedTestGroups` and `-PintegrationTestSuiteFile=.xml` to pick a specific TestNG suite.) + +### Running the full CI pipeline (Personal CI) + +The full test suite is large and slow to run locally. While iterating on a change, run only the narrowly-scoped tests relevant to the change (a single test class or package, see above) rather than a module's entire test task. To validate a larger change against the **full** CI pipeline, do not run everything locally — use **Personal CI**, which runs Pulsar's CI workflows in the contributor's own GitHub fork. + +If Personal CI is not yet set up, **ask the user to follow** the [Personal CI documentation](https://pulsar.apache.org/contribute/personal-ci/) to enable it on their fork. Once it is set up, the agent can drive the loop: + +1. Keep the local `master` up-to-date with `apache/pulsar` and rebase the feature branch on it. +2. Push the feature branch to the **fork** to trigger CI runs there. CI runs against the PR opened in the user's own fork (it is normal to have a PR open in the fork *and* a PR for the same branch open in `apache/pulsar` at the same time). +3. Monitor CI status on the fork and fix failures. +4. Open the PR to `apache/pulsar` only after the fork's CI is green. + +Once the PR to `apache/pulsar` has been opened, stop rebasing as part of this loop: step 1's rebase no longer applies — bring in upstream changes by merging `apache/pulsar`'s `master` instead (see the Pull requests section below). + +## Architecture (big picture) + +Pulsar separates a stateless serving layer (brokers) from durable storage (Apache BookKeeper) and a metadata store (ZooKeeper/etcd/others). The modules layer accordingly: + +- **`pulsar-client-api`, `pulsar-client-admin-api`** — public, backward-compatible interfaces only. `pulsar-client-api-v5`/`pulsar-client-v5` are the newer V5 client API (PIP-466/468). +- **`pulsar-client` (`:pulsar-client-original`)** — the Java client implementation (producer/consumer/reader, connection pooling). `pulsar-client-admin` implements the admin REST client. +- **`pulsar-common`** — wire protocol and shared types. Protobuf / lightproto messages are **generated** into `generated-lightproto/` / `generated-sources/` (excluded from checkstyle and spotless). +- **`pulsar-metadata`** — pluggable metadata store abstraction (ZooKeeper, etcd, RocksDB, memory) used by broker and bookkeeper. +- **`managed-ledger`** — the storage abstraction over **Apache BookKeeper**: append-only ledgers + cursors that track consumer/subscription positions. This is the durability layer the broker reads/writes through. +- **`pulsar-broker`** — the server. `PulsarService` is the composition root wiring everything together; `BrokerService` manages topics, subscriptions, and client connections. Entry points: `PulsarBrokerStarter` (broker), `PulsarStandalone`/`PulsarStandaloneStarter` (all-in-one), `PulsarClusterMetadataSetup` (cluster init). +- **`pulsar-proxy`** — optional proxy/gateway in front of brokers. +- **`pulsar-functions/*`** — serverless compute (Functions): `proto`, `api-java`, `instance`, `runtime`, `worker`, `localrun`. +- **`pulsar-io/*`** — connector framework core only; most built-in connectors were moved to the separate `pulsar-connectors` repo (PIP-465). +- **`pulsar-transaction/*`** — transaction coordinator and common types. +- **`tiered-storage/*`, `offloaders/`** — offload ledger data to cloud/filesystem storage. +- **`pulsar-websocket`** — WebSocket-to-Pulsar bridge. **`pulsar-client-tools`** — the `pulsar-admin`/`pulsar-client` CLIs. +- **Shaded/distribution** — `pulsar-client-shaded`, `pulsar-client-all`, `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles server/shell/offloader tarballs. + +The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. PIP-463 = Maven→Gradle migration, PIP-466/468 = V5 client). `pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the relevant PIP for the rationale behind a non-trivial feature or architectural decision. + +## Coding conventions + +Full conventions for code review are in `.github/copilot-instructions.md`. Highlights to apply when writing code: + +- **Style**: 4-space indent, never tabs; always use curly braces (even single-line `if`); no `@author` Javadoc tags. Every `TODO` must reference a GitHub issue (`// TODO: https://github.com/apache/pulsar/issues/XXXX`). Checkstyle config: `buildtools/src/main/resources/pulsar/checkstyle.xml`. Lombok is enabled. +- **Async**: heavy use of `CompletableFuture`. Methods returning `CompletableFuture` must **not** throw synchronously — return `CompletableFuture.failedFuture(e)` (including for argument validation). Never block (`Thread.sleep`, `Future.get()`, `.join()`, blocking IO) on event-loop / async-execution threads. Avoid nested futures; flatten with `thenCompose`. Prefer `OrderedExecutor` for ordered async work. +- **Concurrency**: public classes should be thread-safe; annotate non-thread-safe ones with `@NotThreadSafe`. Give threads meaningful names. +- **Logging**: prefer the [slog](https://github.com/merlimat/slog) library (`io.github.merlimat.slog`); instantiate the logger with Lombok's **`@CustomLog`** annotation (wired in `lombok.config` to `io.github.merlimat.slog.Logger.get(TYPE)`). SLF4J is still available but is **considered deprecated** for Pulsar logging — don't add new SLF4J loggers. Never use `System.out`/`System.err`. Assume INFO in production. **Default new logs to TRACE or DEBUG, not INFO** — existing Pulsar code overuses INFO, which makes production logs excessively noisy. Use **TRACE** for fine-grained/low-level detail, **DEBUG** for diagnostics that could reasonably be enabled in production without flooding the logs, and reserve **INFO** for significant lifecycle/state-change events that are low-frequency. With slog, attach data as **structured attributes** (`log.info().attr("topic", topic).log("Published")`) rather than interpolating it into the message string. For expensive values, don't guard with `isDebugEnabled()`/`isTraceEnabled()` — instead pass a supplier lambda for the attribute value (`.attr("dump", () -> expensiveDump())`) or use the deferred form (`log.debug(e -> e.attr(...).log("msg"))`), both of which run only when the level is enabled. Avoid logging in hot paths and stack traces at INFO or lower. +- **Networking/memory**: prefer Netty `ByteBuf` over `ByteBuffer` on internal paths; always close streams/connections/executors/buffers. +- **Dependencies**: prefer existing libraries (Guava/Commons, FastUtil, JCTools, RoaringBitmap, Caffeine, Jackson, Netty, OpenTelemetry). New dependencies must be justified and have LICENSE/NOTICE updated (verify with `checkBinaryLicense`). +- **Backward compatibility**: do not break public APIs, client compatibility, wire protocol, or serialized/metadata formats. Servers must work with both older and newer clients. + +## Testing conventions + +- TestNG + Mockito. Prefer **AssertJ** assertions with descriptions over TestNG asserts; use **Awaitility** (with AssertJ) for async conditions instead of `sleep`-based timing. Add timeouts to prevent hangs. +- **No reflection to access private state in tests** (`WhiteboxImpl.getInternalState/setInternalState`, `setAccessible(true)`, etc.). Instead expose a **package-private** `@VisibleForTesting` accessor and place the test in the same package. New reflection-based test access should be flagged in review. +- Every feature/bugfix should include tests covering edge cases and failure scenarios; they must be deterministic. +- **Close/release resources in tests** — shut down executors/clients/services and release Netty `ByteBuf`s (and other ref-counted buffers) that the test allocates. The test harness has leak detectors (`ThreadLeakDetectorListener`, Netty pooled-allocator/`ByteBuf` leak detection via `-Dpulsar.allocator.pooled=true`) that fail tests on leaks. When a leak surfaces, **treat it as a likely production bug** and fix the root cause (the missing close/release in the code under test) rather than working around it in the test or suppressing the detector. + +## Pull requests + +PRs must follow `.github/PULL_REQUEST_TEMPLATE.md`. PR titles follow the `[][] ` convention (e.g. `[fix][broker] ...`, `[improve][build] ...`) — refer to `.github/workflows/ci-semantic-pull-request.yml` for the valid `[type]` and `[scope]` prefixes, which are enforced by CI. The `` should be in imperative form, like a good commit message's subject line. + +**Never rebase a PR branch once the PR is opened in `apache/pulsar`.** Rebasing rewrites history and disrupts reviewers (it invalidates review comments and makes incremental diffs unreadable). To bring in upstream changes, instead fetch from the `apache/pulsar` remote and **merge** its `master` into the PR branch: + +```bash +git fetch # e.g. `upstream` or `apache` +git merge /master +``` + +(Rebasing onto an updated `master` is fine *before* the PR is opened — see the Personal CI loop above — but not after.) + +## Reporting security vulnerabilities + +Please refer to https://pulsar.apache.org/security/. See https://www.apache.org/security/ for the general ASF policy. + +For already-public CVEs where you want to check Pulsar's exposure, the right venue is a GitHub issue on apache/pulsar or a question on the dev@pulsar.apache.org mailing list. Before asking about an already-public CVE and whether it's already fixed in Pulsar, search PRs and issues with the CVE id at https://github.com/apache/pulsar/pulls (also check issues and closed PRs/issues). Pulsar project's supported versions are available at https://pulsar.apache.org/contribute/release-policy/. Users should upgrade to supported versions to receive security updates. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000000000..1d80b15b084a3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +See @AGENTS.md for guidance on working with this repository. diff --git a/README.md b/README.md index b729bf8135a46..11ea7d870f3fb 100644 --- a/README.md +++ b/README.md @@ -309,6 +309,10 @@ To report a vulnerability for Pulsar, contact the [Apache Security Team](https:/ https://github.com/apache/pulsar/security/policy contains more details. +### Checking exposure to an already-public CVE + +For already-public CVEs where you want to check Pulsar's exposure, the right venue is a GitHub issue on apache/pulsar or a question on the dev@pulsar.apache.org mailing list. Before asking about an already-public CVE and whether it's already fixed in Pulsar, search PRs and issues with the CVE id at https://github.com/apache/pulsar/pulls (also check issues and closed PRs/issues). Pulsar project's supported versions are available at https://pulsar.apache.org/contribute/release-policy/. Users should upgrade to supported versions to receive security updates. + ## License Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0 From d6e072656f16d2926cfd6eb179733577b7016348 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 11:57:39 +0300 Subject: [PATCH 02/39] [improve][misc] Clarify thread leak detector false positives in test resource cleanup guidance --- .github/copilot-instructions.md | 19 ++++++++++++------- AGENTS.md | 4 +++- 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 79e538f21d836..cd575a15cae38 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -125,13 +125,18 @@ For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBu ## Resource Cleanup in Tests Tests must also close/release everything they allocate — shut down executors/clients/services and release -Netty `ByteBuf`s (and other reference-counted buffers). The test harness runs leak detectors -(`ThreadLeakDetectorListener`, Netty pooled-allocator/`ByteBuf` leak detection enabled via -`-Dpulsar.allocator.pooled=true`) that fail the build on thread or buffer leaks. - -When a leak is reported by a test, treat it as a **likely bug in the production code** (a missing close or -`release()` in the code under test), not merely test noise. Fix the root cause rather than working around it -in the test or disabling the detector. +Netty `ByteBuf`s (and other reference-counted buffers). + +* A reported **`ByteBuf`/buffer leak** (Netty pooled-allocator leak detection, enabled via + `-Dpulsar.allocator.pooled=true`) is a **real bug**. Fix the root cause — the missing `release()` in the + code under test — rather than working around it in the test or suppressing the detector. +* **Thread leaks** reported by `ThreadLeakDetectorListener` are currently **not a reliable signal**. The + detector produces a high rate of false positives — notably with `SharedPulsarBaseTest`, and whenever the + `THREAD_LEAK_DETECTOR_WAIT_MILLIS` environment value is not set high enough (≈`10000` is recommended) to + let asynchronous shutdown complete before the check runs. That wait setting only takes effect when the + **Gradle daemon is disabled** (`--no-daemon`). Because of these false positives, do not rely on + `ThreadLeakDetectorListener` alone to conclude that a change is thread-leak-free; corroborate before + treating a reported thread leak as a real bug. --- diff --git a/AGENTS.md b/AGENTS.md index 26d2da9e11642..8bb48f546cef0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -126,7 +126,9 @@ Full conventions for code review are in `.github/copilot-instructions.md`. Highl - TestNG + Mockito. Prefer **AssertJ** assertions with descriptions over TestNG asserts; use **Awaitility** (with AssertJ) for async conditions instead of `sleep`-based timing. Add timeouts to prevent hangs. - **No reflection to access private state in tests** (`WhiteboxImpl.getInternalState/setInternalState`, `setAccessible(true)`, etc.). Instead expose a **package-private** `@VisibleForTesting` accessor and place the test in the same package. New reflection-based test access should be flagged in review. - Every feature/bugfix should include tests covering edge cases and failure scenarios; they must be deterministic. -- **Close/release resources in tests** — shut down executors/clients/services and release Netty `ByteBuf`s (and other ref-counted buffers) that the test allocates. The test harness has leak detectors (`ThreadLeakDetectorListener`, Netty pooled-allocator/`ByteBuf` leak detection via `-Dpulsar.allocator.pooled=true`) that fail tests on leaks. When a leak surfaces, **treat it as a likely production bug** and fix the root cause (the missing close/release in the code under test) rather than working around it in the test or suppressing the detector. +- **Close/release resources in tests** — shut down executors/clients/services and release Netty `ByteBuf`s (and other ref-counted buffers) that the test allocates. + - A reported **`ByteBuf`/buffer leak** (Netty pooled-allocator leak detection, enabled via `-Dpulsar.allocator.pooled=true`) is a **real bug**: fix the root cause — the missing `release()` in the code under test — rather than working around it in the test or suppressing the detector. + - **Thread leaks** reported by `ThreadLeakDetectorListener` are currently **not a reliable signal**: it produces a high rate of false positives — notably with `SharedPulsarBaseTest`, and whenever `THREAD_LEAK_DETECTOR_WAIT_MILLIS` is not set high enough (≈`10000` is recommended) for asynchronous shutdown to complete before the check runs. That wait setting only takes effect with the **Gradle daemon disabled** (`--no-daemon`). Because of the false positives, do not rely on `ThreadLeakDetectorListener` alone to conclude a change is thread-leak-free — corroborate before treating a reported thread leak as a real bug. ## Pull requests From 84d12bdd9fd513ce406403b396e99f370b6bdca6 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 12:39:59 +0300 Subject: [PATCH 03/39] [improve][misc] Require PR description to cover motivation and modifications --- .github/copilot-instructions.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index cd575a15cae38..4b0296e24391f 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -332,6 +332,19 @@ Integration tests may be required for distributed components. --- +# Pull Request Description + +Every pull request must describe the change so reviewers have the context they need. At a minimum, the +description should cover: + +* **Motivation (why?)** — the context and the problem being solved. +* **Modifications (what / how?)** — what was changed and how it addresses the problem. + +These map to the **Motivation** and **Modifications** sections of `.github/PULL_REQUEST_TEMPLATE.md`. A title +alone, or a description that only restates the title, is not sufficient. + +--- + # Pull Request Review Guidance When reviewing a pull request, Copilot should: @@ -346,5 +359,7 @@ When reviewing a pull request, Copilot should: * suggest missing tests when appropriate * flag reflection-based access to private fields or methods in tests (e.g. `WhiteboxImpl`, `setAccessible(true)`) and recommend a package-private `@VisibleForTesting` accessor instead +* check that the PR description adequately explains the change — add a comment asking for a proper + description (at least the motivation and the modifications) unless this is already covered Focus feedback on correctness, reliability, and maintainability. \ No newline at end of file From 8056c25b0a79668948102273f596245047c7ae22 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:29:17 +0300 Subject: [PATCH 04/39] [improve][misc] Restructure agent docs: split CONTRIBUTING/ARCHITECTURE/CODING/SECURITY, add .agents/skills Follow apache/groovy's layout per PR review feedback: AGENTS.md becomes a concise router/index; the detail moves to human-facing CONTRIBUTING.md (dev/build/test/PR/CI), ARCHITECTURE.md (modules + build), CODING.md (conventions + review checklist), and SECURITY.md (reporting, disclosure hygiene, public-CVE checks). Task-specific guardrails live under .agents/skills/ (pulsar-build, pulsar-tests, pulsar-pr-workflow, pulsar-security) and are loaded on demand to keep agent context small. CLAUDE.md and .github/copilot-instructions.md are now symlinks to AGENTS.md. --- .agents/skills/README.md | 19 ++ .agents/skills/pulsar-build/SKILL.md | 65 ++++ .agents/skills/pulsar-pr-workflow/SKILL.md | 65 ++++ .agents/skills/pulsar-security/SKILL.md | 51 +++ .agents/skills/pulsar-tests/SKILL.md | 59 ++++ .github/copilot-instructions.md | 366 +-------------------- AGENTS.md | 208 ++++-------- ARCHITECTURE.md | 91 +++++ CLAUDE.md | 2 +- CODING.md | 149 +++++++++ CONTRIBUTING.md | 159 ++++++++- SECURITY.md | 43 ++- 12 files changed, 758 insertions(+), 519 deletions(-) create mode 100644 .agents/skills/README.md create mode 100644 .agents/skills/pulsar-build/SKILL.md create mode 100644 .agents/skills/pulsar-pr-workflow/SKILL.md create mode 100644 .agents/skills/pulsar-security/SKILL.md create mode 100644 .agents/skills/pulsar-tests/SKILL.md mode change 100644 => 120000 .github/copilot-instructions.md create mode 100644 ARCHITECTURE.md mode change 100644 => 120000 CLAUDE.md create mode 100644 CODING.md diff --git a/.agents/skills/README.md b/.agents/skills/README.md new file mode 100644 index 0000000000000..2b1daca25e968 --- /dev/null +++ b/.agents/skills/README.md @@ -0,0 +1,19 @@ +# Agent skills + +Task-specific guidance for AI coding assistants working in apache/pulsar. Each skill is a directory +with a `SKILL.md` that adds AI-tooling guardrails on top of the canonical docs +([`CONTRIBUTING.md`](../../CONTRIBUTING.md), [`ARCHITECTURE.md`](../../ARCHITECTURE.md), +[`CODING.md`](../../CODING.md)). Load the relevant skill **on demand** — before working in its area — +instead of pulling everything into context. The index of skills is in the repository-root +[`AGENTS.md`](../../AGENTS.md#skills). + +| Skill | Use for | +|-------|---------| +| [`pulsar-build`](pulsar-build/SKILL.md) | Gradle build changes — convention plugins, `settings.gradle.kts`, version catalog, dependencies. | +| [`pulsar-tests`](pulsar-tests/SKILL.md) | Writing and running tests — `--tests` scoping, TestNG groups, leak detectors. | +| [`pulsar-pr-workflow`](pulsar-pr-workflow/SKILL.md) | Branching, semantic PR titles, descriptions, Personal CI, rebase-vs-merge. | +| [`pulsar-security`](pulsar-security/SKILL.md) | Reporting a vulnerability or checking exposure to a public CVE. | + +To add a skill, create `/SKILL.md` with YAML frontmatter (`name`, `description` — phrased +so an agent knows *when* to use it), keep it focused on one task area, cite the canonical docs rather +than restating them, and add a row here and in [`AGENTS.md`](../../AGENTS.md#skills). diff --git a/.agents/skills/pulsar-build/SKILL.md b/.agents/skills/pulsar-build/SKILL.md new file mode 100644 index 0000000000000..51f4b682192a4 --- /dev/null +++ b/.agents/skills/pulsar-build/SKILL.md @@ -0,0 +1,65 @@ +--- +name: pulsar-build +description: AI-tooling guardrails for changes to the Apache Pulsar Gradle build. Points at the build conventions in ARCHITECTURE.md (tiered modules, convention plugins under build-logic/, the gradle/libs.versions.toml version catalog, the pulsar-dependencies enforced platform) and adds the AI-specific constraints on top — configuration-cache / configure-on-demand compatibility, no hard-coded versions, update LICENSE/NOTICE after dependency changes. Use when editing build.gradle.kts, settings.gradle.kts, build-logic/, gradle.properties, or gradle/libs.versions.toml. +license: Apache-2.0 +compatibility: claude, codex, copilot, cursor, gemini, aider +metadata: + audience: contributors to apache/pulsar + scope: ai-tooling-build-guardrails +--- + +# Pulsar build + +AI-tooling layer over the Gradle build. The conventions themselves live in +[`ARCHITECTURE.md`'s "Build infrastructure" section](../../../ARCHITECTURE.md#build-infrastructure); +this skill cites them and adds the guardrails below. + +## When to use this skill + +**Use it for:** + +- Convention-plugin changes under `build-logic/conventions/` (`pulsar.java-conventions`, + `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, …). +- `settings.gradle.kts` (module wiring/tiers) and root or module `build.gradle.kts` edits. +- `gradle.properties`, `gradle/libs.versions.toml` (version catalog), or the `pulsar-dependencies` + enforced platform. +- Adding, removing, or upgrading a runtime or test dependency. +- Gradle wrapper bumps (`gradle/wrapper/gradle-wrapper.properties`). + +**Don't use it for:** production/test source changes — those are +[`pulsar-tests`](../pulsar-tests/SKILL.md) or the relevant module. + +## Read first + +- [`ARCHITECTURE.md` → Build infrastructure](../../../ARCHITECTURE.md#build-infrastructure) — tiers, + convention plugins, version catalog, enforced platform, the module-name gotcha. +- [`CODING.md` → Dependencies](../../../CODING.md#dependencies) — which libraries are preferred and + the LICENSE/NOTICE obligation. + +## Guardrails + +- **Edit shared config in the convention plugins, not per-module.** Compile/test/dependency defaults + belong in `build-logic/conventions/`; don't duplicate them across module build scripts. +- **Versions come from the catalog.** Reference `libs.*` / the `pulsar-dependencies` platform; never + hard-code a version string in a build script. Add or change versions in + `gradle/libs.versions.toml`. +- **Stay configuration-cache and configure-on-demand compatible.** Both are enabled + (`org.gradle.configuration-cache=true`, `org.gradle.configureondemand=true`). Any task on a + commonly-run path (`assemble`, `test`, `integrationTest`, `rat`/`spotlessCheck`/`checkstyle*`, + `checkBinaryLicense`, `docker*`) must not read mutable state at execution time or access `Project` + in task actions — use `Provider` / value sources. **Verify with `--configuration-cache`.** + Tooling/one-off tasks (e.g. `verifyTestGroups`, dependency reports) may be exempt. +- **Follow [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html).** +- **Use the real Gradle project path** (mind the module-name-vs-directory gotcha — directory + `pulsar-client/` is project `:pulsar-client-original`). +- **After any dependency change, run `./gradlew checkBinaryLicense`** and update the binary + distribution `LICENSE`/`NOTICE` accordingly. Justify a genuinely new dependency. +- **Don't fabricate plugin/DSL/task names** — verify they exist in the build. + +## Validation checklist + +- [ ] `./gradlew help --configuration-cache` (and the affected task with `--configuration-cache`) + succeeds with no config-cache problems. +- [ ] `./gradlew assemble` (or the affected module's `assemble`) passes. +- [ ] `./gradlew rat spotlessCheck checkstyleMain checkstyleTest` passes. +- [ ] If dependencies changed: `./gradlew checkBinaryLicense` passes and LICENSE/NOTICE are updated. diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md new file mode 100644 index 0000000000000..4021914db0d5a --- /dev/null +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -0,0 +1,65 @@ +--- +name: pulsar-pr-workflow +description: AI-tooling guardrails for branching, commits, and pull requests against apache/pulsar. Points at the PR conventions and Personal CI loop in CONTRIBUTING.md, then adds the AI-specific constraints — semantic [type][scope] titles validated against ci-semantic-pull-request.yml, PR descriptions that cover motivation and modifications, never rebase after the PR is opened (merge upstream master instead), and per-action user confirmation before pushing or opening a PR. Use when creating a branch, committing, opening or updating a PR, or running the Personal CI loop. +license: Apache-2.0 +compatibility: claude, codex, copilot, cursor, gemini, aider +metadata: + audience: contributors to apache/pulsar + scope: ai-tooling-pr-guardrails +--- + +# Pulsar PR workflow + +AI-tooling layer over the contribution workflow in +[`CONTRIBUTING.md` → Pull requests](../../../CONTRIBUTING.md#pull-requests) and +[Personal CI](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci). + +## When to use this skill + +Use it when creating a branch, committing, opening or updating a pull request, or driving the +Personal CI loop on a fork. + +## Read first + +- [`CONTRIBUTING.md` → Pull requests](../../../CONTRIBUTING.md#pull-requests) — title convention, + description requirement, rebase-vs-merge rule. +- [`CONTRIBUTING.md` → Personal CI](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci) + — the fork-based full-pipeline loop. +- `.github/PULL_REQUEST_TEMPLATE.md` and `.github/workflows/ci-semantic-pull-request.yml`. + +## Guardrails + +- **Each state-changing action needs its own explicit user confirmation.** Pushing, opening a PR, and + posting comments are not authorised just because the task was started. Confirm before each. +- **Before pushing, ensure lint passes:** `./gradlew rat spotlessCheck checkstyleMain checkstyleTest` + (`spotlessApply` to auto-fix). Scope test runs with `--tests` (see [`pulsar-tests`](../pulsar-tests/SKILL.md)). +- **Semantic PR title** `[][] `: valid `type`/`scope` values are + enforced by `.github/workflows/ci-semantic-pull-request.yml` — read it rather than guessing. + `` is imperative, like a commit subject. +- **PR description must cover Motivation (why?) and Modifications (what/how?)** per the template — a + title alone, or a description restating the title, is not enough. Add `Fixes #N` / `Main Issue: #N` + / `PIP: #N` when applicable. +- **Never rebase once the PR is open in `apache/pulsar`** — it invalidates review comments and + incremental diffs. Bring in upstream changes by merging: + + ```bash + git fetch # e.g. upstream / apache + git merge /master + ``` + + Rebasing is fine *before* the PR is opened (e.g. during the Personal CI loop). +- **Personal CI loop (pre-PR):** keep local `master` current, rebase the feature branch, push to the + **fork** to trigger CI, fix failures, and open the PR to `apache/pulsar` only when the fork's CI is + green. A PR can be open in both the fork and `apache/pulsar` at once. After the upstream PR is open, + switch from rebase to merge (above). +- **Keep commits focused** — a fix, a refactor, and a formatting pass are separate commits/PRs. +- **Security:** never describe the security nature of a change in a commit/PR — see + [`pulsar-security`](../pulsar-security/SKILL.md). + +## Validation checklist + +- [ ] Lint (`rat spotlessCheck checkstyleMain checkstyleTest`) passes locally. +- [ ] Title matches the semantic `[type][scope]` rules in `ci-semantic-pull-request.yml`. +- [ ] Description has Motivation + Modifications and links any issue/PIP. +- [ ] No rebase after the upstream PR was opened; upstream changes pulled in via merge. +- [ ] Each push / PR-open / comment was explicitly confirmed by the user. diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md new file mode 100644 index 0000000000000..a3b18dd23a2c7 --- /dev/null +++ b/.agents/skills/pulsar-security/SKILL.md @@ -0,0 +1,51 @@ +--- +name: pulsar-security +description: Guardrails for handling security topics in Apache Pulsar — reporting a vulnerability through the private ASF channels (never a public issue, PR, or commit), and checking whether Pulsar is exposed to an already-public CVE. Use when a task involves a suspected vulnerability, a CVE, security advisories, or questions about supported/patched versions. +license: Apache-2.0 +compatibility: claude, codex, copilot, cursor, gemini, aider +metadata: + audience: contributors to apache/pulsar + scope: ai-tooling-security-guardrails +--- + +# Pulsar security + +Guardrails for security-related work. Canonical policy: [`SECURITY.md`](../../../SECURITY.md), +, and . + +## When to use this skill + +Use it when a task touches a suspected or reported vulnerability, a CVE, a security advisory, or +questions about which Pulsar versions are supported/patched. + +## Reporting a (non-public) vulnerability + +- **Never disclose a vulnerability in a public channel** — not in a GitHub issue, PR title/body, + commit message, or mailing list. Public disclosure before a fix is released puts users at risk. +- Report privately to the ASF Security Team (`security@apache.org`), optionally copying + `private@pulsar.apache.org`. Follow and + . +- When a change fixes a non-public security issue, **do not state its security nature** in the commit + message, PR title, or PR body. + +## Checking exposure to an already-public CVE + +For a CVE that is **already public** (e.g. in a dependency) and you want to know Pulsar's exposure or +whether it is already fixed: + +1. **Search existing PRs and issues by the CVE id** at — + also check issues and **closed** PRs/issues, since the fix may already be merged or the question + already answered. +2. If not found, ask via a **GitHub issue on apache/pulsar** or on **dev@pulsar.apache.org** (this is + the right venue for already-public CVEs — it is not a private disclosure). +3. Check the dependency version against `gradle/libs.versions.toml` / the `pulsar-dependencies` + platform to determine whether the affected version is actually shipped. +4. Supported versions are listed at ; users + should upgrade to a supported version to receive security updates. + +## Guardrails + +- Treat the public-vs-private distinction as the deciding factor for *where* to communicate: public + CVE → public issue/list; suspected new vulnerability → private ASF channels only. +- Don't fabricate CVE status — verify against merged PRs and the shipped dependency versions before + concluding "fixed" or "not affected". diff --git a/.agents/skills/pulsar-tests/SKILL.md b/.agents/skills/pulsar-tests/SKILL.md new file mode 100644 index 0000000000000..3e0b6cc4c706a --- /dev/null +++ b/.agents/skills/pulsar-tests/SKILL.md @@ -0,0 +1,59 @@ +--- +name: pulsar-tests +description: AI-tooling guardrails for writing and running Apache Pulsar tests. Points at the testing conventions in CODING.md and the run instructions in CONTRIBUTING.md, then adds the AI-specific constraints — scope runs with --tests (never a whole module), no reflection into private state, treat ByteBuf leaks as real bugs but ThreadLeakDetectorListener reports as unreliable, prefer -PtestRetryCount=0. Use when adding, modifying, or running unit or integration tests (TestNG/Mockito/AssertJ/Awaitility). +license: Apache-2.0 +compatibility: claude, codex, copilot, cursor, gemini, aider +metadata: + audience: contributors to apache/pulsar + scope: ai-tooling-test-guardrails +--- + +# Pulsar tests + +AI-tooling layer over Pulsar's test conventions +([`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions)) and how to run them +([`CONTRIBUTING.md` → Running tests](../../../CONTRIBUTING.md#running-tests)). + +## When to use this skill + +Use it whenever adding, modifying, or running unit or integration tests, or when investigating a test +failure / flake. + +## Read first + +- [`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions) — frameworks, + no-reflection rule, resource/leak cleanup. +- [`CONTRIBUTING.md` → Running tests](../../../CONTRIBUTING.md#running-tests) — `--tests` scoping, + test groups, integration tests, Personal CI. + +## Guardrails + +- **Always scope runs with `--tests`.** Running a module's whole `test` task is slow; pass a class, + method, or package pattern. Mind the module-name gotcha (`pulsar-client/` → `:pulsar-client-original`). +- **Prefer `-PtestRetryCount=0` locally** so failures and flakiness surface immediately instead of + being masked by the default single retry. +- **A test in the `flaky`/`quarantine` group is excluded by default** — add `-PexcludedTestGroups=''` + to run it. +- **Use AssertJ (with descriptions) + Awaitility** for async conditions; never `Thread.sleep`-based + timing. Add timeouts to prevent hangs. Tests must be deterministic. +- **No reflection into private state** (`WhiteboxImpl.getInternalState`/`setInternalState`, + `setAccessible(true)`). Add a package-private `@VisibleForTesting` accessor and put the test in the + same package instead. Flag any new reflection-based access. +- **Close/release what the test allocates** — executors, clients, services, and Netty `ByteBuf`s. +- **Leak signals are not equal:** + - A **`ByteBuf`/buffer leak** (pooled-allocator detection, `-Dpulsar.allocator.pooled=true`) is a + **real bug** — fix the missing `release()` in the code under test; don't suppress it. + - A **thread leak** from `ThreadLeakDetectorListener` is **currently unreliable** (high + false-positive rate, notably with `SharedPulsarBaseTest` and when `THREAD_LEAK_DETECTOR_WAIT_MILLIS` + isn't high enough — ≈`10000`, and only effective with the Gradle daemon disabled, `--no-daemon`). + Don't treat it as a real bug on its own; corroborate first. +- **Don't fabricate assertions or test names** — verify symbols exist. +- For a **bugfix**, add a regression test and confirm it fails before the fix and passes after. + +## Validation checklist + +- [ ] The new/changed test runs green via a scoped `--tests` invocation with `-PtestRetryCount=0`. +- [ ] No reflection into private state; resources and buffers are released. +- [ ] For a bugfix: the test fails on the unpatched code and passes with the fix. +- [ ] For larger changes, the full suite is validated via **Personal CI**, not locally + (see [`CONTRIBUTING.md`](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci)). diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md deleted file mode 100644 index 4b0296e24391f..0000000000000 --- a/.github/copilot-instructions.md +++ /dev/null @@ -1,365 +0,0 @@ -The following instructions are only to be applied when performing a code review. - -# Copilot Instructions for Apache Pulsar - -## Project Context - -Apache Pulsar is a distributed messaging and streaming platform designed for high throughput, low latency, and -horizontal scalability. - -The codebase contains performance-critical, asynchronous, and concurrency-sensitive components such as brokers, storage -clients, and networking layers. - -Code reviews should prioritize: - -* correctness -* thread safety -* performance -* maintainability -* backward compatibility - ---- - -# Java Coding Conventions - -Apache Pulsar follows the Sun Java Coding Conventions with additional project-specific rules. - -## Formatting - -* Use **4 spaces** for indentation. -* **Tabs must never be used**. -* Always use **curly braces**, even for single-line `if` statements. - -Example: - -```java -if (condition) { - doSomething(); -} -``` - -## Javadoc - -* Do **not include `@author` tags** in Javadoc. - -## TODO Comments - -All TODO comments must reference a GitHub issue. - -Example: - -```java -// TODO: https://github.com/apache/pulsar/issues/XXXX -``` - ---- - -# Dependencies - -Prefer existing dependencies instead of introducing new libraries. - -The Pulsar codebase commonly uses: - -* **Apache Commons libraries or Guava** for utilities -* **FastUtil** for optimized type specific collections -* **JCTools** for high performance concurrent data structures -* **RoaringBitmap** for compressed bitmaps (bitsets) -* **Caffeine** for caching -* **Jackson** for JSON handling -* **Prometheus Java simpleclient** (or newer **Prometheus Java Metrics Library**) for metrics -* **OpenTelemetry API** for metrics -* **Netty** for networking and buffers - -When introducing a new dependency: - -* justify why existing dependencies are insufficient -* ensure required license files and notices are updated - ---- - -# Logging Guidelines - -* Prefer the **[slog](https://github.com/merlimat/slog)** library (`io.github.merlimat.slog`) for logging. -* Instantiate the logger with Lombok's **`@CustomLog`** annotation, which is configured in `lombok.config` to create an `io.github.merlimat.slog.Logger` (`io.github.merlimat.slog.Logger.get(TYPE)`). -* **SLF4J** is still available but is **considered deprecated** for Pulsar logging. Do not introduce new SLF4J loggers; flag new SLF4J usage in review and suggest `@CustomLog` (slog) instead. -* Do **not use** `System.out` or `System.err`. -* Assume production commonly runs at **INFO** log level. -* **Default new log statements to `TRACE` or `DEBUG`, not `INFO`.** Existing Pulsar code overuses `INFO`, which produces excessively noisy production logs. Use `TRACE` for fine-grained/low-level detail, `DEBUG` for diagnostics that could reasonably be enabled in production without flooding the logs, and reserve `INFO` for low-frequency, significant lifecycle/state-change events. Flag new `INFO`-level logging that would be better at `DEBUG`/`TRACE`. -* Avoid excessive logging in hot paths. -* With slog, attach data as **structured attributes** (`log.info().attr("topic", topic).log("Published")`) instead of interpolating it into the message string. -* For expensive `DEBUG`/`TRACE` values, do not guard with `log.isDebugEnabled()`/`log.isTraceEnabled()`. Instead use slog's lazy evaluation, which only runs when the level is enabled: - * pass a supplier lambda as the attribute value — `log.debug().attr("dump", () -> expensiveDump()).log("...")`, or - * use the deferred event form — `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. -* Avoid logging stack traces at `INFO` and lower levels. - -Log messages should be: - -* clear and descriptive -* capitalized -* understandable without reading the code - ---- - -# Resource and Memory Management - -Ensure resources are always closed correctly. - -Prefer: - -```java -try (InputStream in = ...) { - // use resource -} -``` - -Avoid leaks for: - -* streams -* network connections -* executors -* buffers - -For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an external API requires -`ByteBuffer`. - -## Resource Cleanup in Tests - -Tests must also close/release everything they allocate — shut down executors/clients/services and release -Netty `ByteBuf`s (and other reference-counted buffers). - -* A reported **`ByteBuf`/buffer leak** (Netty pooled-allocator leak detection, enabled via - `-Dpulsar.allocator.pooled=true`) is a **real bug**. Fix the root cause — the missing `release()` in the - code under test — rather than working around it in the test or suppressing the detector. -* **Thread leaks** reported by `ThreadLeakDetectorListener` are currently **not a reliable signal**. The - detector produces a high rate of false positives — notably with `SharedPulsarBaseTest`, and whenever the - `THREAD_LEAK_DETECTOR_WAIT_MILLIS` environment value is not set high enough (≈`10000` is recommended) to - let asynchronous shutdown complete before the check runs. That wait setting only takes effect when the - **Gradle daemon is disabled** (`--no-daemon`). Because of these false positives, do not rely on - `ThreadLeakDetectorListener` alone to conclude that a change is thread-leak-free; corroborate before - treating a reported thread leak as a real bug. - ---- - -# Configuration Guidelines - -When adding configuration options: - -* use clear and descriptive names -* provide sensible default values -* update default configuration files -* document the configuration option - ---- - -# Concurrency Guidelines - -Pulsar is designed as a low-latency asynchronous system. - -Verify: - -* public classes are **thread-safe** -* shared mutable state is protected -* mutations occur on the intended thread -* fine-grained synchronization is preferred -* threads have meaningful names for diagnostics - -If a class is not thread-safe, annotate it with: - -```java -@NotThreadSafe -``` - -Prefer **OrderedExecutor** for ordered asynchronous execution. - ---- - -# Asynchronous Programming Guidelines - -Pulsar relies heavily on `CompletableFuture` and asynchronous pipelines. - -Prefer `CompletableFuture` APIs over `ListenableFuture` for new code. - -## Avoid Blocking in Async Paths - -Do not introduce blocking operations in asynchronous execution paths. - -Examples of blocking operations: - -* `Thread.sleep` -* `Future.get()` -* `CompletableFuture.join()` -* blocking IO operations - -Blocking calls must not run on event loop or async execution threads. - -## Avoid Nested Futures - -Avoid returning nested futures such as: - -```java -CompletableFuture> -``` - -Prefer flattening with `thenCompose`. - -## Asynchronous Exception Handling - -Methods returning `CompletableFuture` must not throw synchronous exceptions directly. - -Incorrect: - -```java -public CompletableFuture operation() { - if (error) { - throw new IllegalStateException("unexpected state"); - } - return CompletableFuture.completedFuture(null); -} -``` - -Use returned futures to propagate failures: - -```java -return CompletableFuture.failedFuture(exception); -``` - -This also applies to argument validation in async methods: - -```java -if (arg == null) { - return CompletableFuture.failedFuture(new IllegalArgumentException("arg")); -} -``` - -Throwing exceptions inside async stages such as `thenApply`, `thenCompose`, `thenRun`, `handle`, or `whenComplete` -is acceptable: - -```java -return future.thenApply(v -> { - if (error) { - throw new IllegalStateException("unexpected state"); - } - return result; -}); -``` - ---- - -# Backward Compatibility - -Apache Pulsar maintains strong compatibility guarantees. - -Changes must not break: - -* public APIs -* client compatibility -* wire protocol compatibility -* metadata or serialized formats - -Servers must be compatible with both older and newer clients. - -Flag any change that may break compatibility. - ---- - -# Testing Guidelines - -## Unit testing - -* TestNG is used as the testing framework -* Mocking uses Mockito -* Assertions should prefer using AssertJ library with descriptions over using TestNG assertions -* Awaitility should be used to handle assertions with asynchronous conditions together with AssertJ - -## Avoid Reflection in Tests - -Do **not** use reflection to access private fields or methods from tests. This includes -`WhiteboxImpl.getInternalState`, `WhiteboxImpl.setInternalState`, `Field.setAccessible(true)`, -`Method.setAccessible(true)`, and similar reflection helpers. - -Reflection-based test access is discouraged because it: - -* breaks during refactoring without any compile-time signal — renaming or retyping a field - silently invalidates the test -* produces verbose, brittle, and hard-to-read test code -* couples tests to implementation details that should be free to change - -Instead, expose what tests legitimately need through **package-private** methods (or fields -where appropriate) and annotate them with `@VisibleForTesting`: - -```java -@VisibleForTesting -Map getSubscriptions() { - return subscriptions; -} -``` - -The test then accesses state through a normal, statically-typed call: - -```java -var subscriptions = persistentTopic.getSubscriptions(); -``` - -instead of: - -```java -ConcurrentOpenHashMap subscriptions = - WhiteboxImpl.getInternalState(persistentTopic, "subscriptions"); -``` - -Place the test in the same package as the class under test so package-private visibility is -sufficient — no need to widen visibility to `public` for testing. - -When reviewing PRs, flag any new use of reflection in tests and suggest a `@VisibleForTesting` -package-private accessor instead. See the dev@ thread -[Stop using reflection to access private fields in tests](https://lists.apache.org/thread/7gr04sqmzyttx4ln6ydtp3qv0xgo1o6m) -for the full rationale. - -# Testing Expectations - -Every feature or bug fix should include tests. - -Verify: - -* unit tests exist -* edge cases are covered -* failure scenarios are tested -* tests are deterministic and stable -* tests avoid `sleep`-based timing assumptions -* tests include timeouts to prevent hangs - -Integration tests may be required for distributed components. - ---- - -# Pull Request Description - -Every pull request must describe the change so reviewers have the context they need. At a minimum, the -description should cover: - -* **Motivation (why?)** — the context and the problem being solved. -* **Modifications (what / how?)** — what was changed and how it addresses the problem. - -These map to the **Motivation** and **Modifications** sections of `.github/PULL_REQUEST_TEMPLATE.md`. A title -alone, or a description that only restates the title, is not sufficient. - ---- - -# Pull Request Review Guidance - -When reviewing a pull request, Copilot should: - -* verify Java coding conventions are followed -* detect thread safety risks -* flag blocking operations in async paths -* detect improper `CompletableFuture` usage -* detect unnecessary dependencies and missing license updates -* ensure logging follows project guidelines -* verify backward compatibility -* suggest missing tests when appropriate -* flag reflection-based access to private fields or methods in tests (e.g. `WhiteboxImpl`, - `setAccessible(true)`) and recommend a package-private `@VisibleForTesting` accessor instead -* check that the PR description adequately explains the change — add a comment asking for a proper - description (at least the motivation and the modifications) unless this is already covered - -Focus feedback on correctness, reliability, and maintainability. \ No newline at end of file diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 120000 index 0000000000000..be77ac83a1895 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1 @@ +../AGENTS.md \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 8bb48f546cef0..ef29c4d147de5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,150 +1,58 @@ -# AGENTS.md - -This file provides guidance to AI coding agents working with code in this repository. - -Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, networking). Prioritize correctness, thread safety, performance, maintainability, and backward compatibility. - -## Build system - -The build is **Gradle** (migrated from Maven via PIP-463; much existing tooling and docs elsewhere still reference Maven). Use the wrapper `./gradlew` — no separate Gradle install needed. **JDK 21 or 25 is required to run the build** (`-PskipJavaVersionCheck` bypasses the check). Bytecode targets Java 17 (`options.release = 17`). - -Key build infrastructure: -- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, higher tiers build on lower ones). -- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. This is where shared compile/test/dependency config lives — edit conventions here rather than duplicating config across modules. -- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; `libs.*` references in build scripts). -- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every module. - -When changing the build, **follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)**. The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and **configure-on-demand** (`org.gradle.configureondemand=true`), so any task used by a commonly-run task (`assemble`, `test`, `integrationTest`, `rat`/`spotlessCheck`/`checkstyle*`, `checkBinaryLicense`, `docker*`, etc.) **must be configuration-cache and configure-on-demand compatible** — no reading of mutable state at execution time, no `Project` access in task actions, use `Provider`/value sources instead. Verify with `--configuration-cache`. Tooling/one-off tasks that are not part of these common flows (e.g. `verifyTestGroups`, dependency-report and ad-hoc maintenance tasks) may be exempt. - -### Module name vs. directory name gotcha - -Several Gradle project paths do **not** match their directory because the Maven artifactId is preserved. Most importantly: -- Directory `pulsar-client/` → project **`:pulsar-client-original`** -- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** -- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` - -Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. Check `settings.gradle.kts` when a path is ambiguous. - -## Common commands - -```bash -# Compile and assemble everything (or a single module) -./gradlew assemble -./gradlew :pulsar-broker:assemble - -# Lint / verify (license headers, formatting, checkstyle) — run before pushing -./gradlew rat spotlessCheck checkstyleMain checkstyleTest -./gradlew spotlessApply # auto-fix license headers/formatting - -# Verify bundled-dependency LICENSE/NOTICE coverage (run after changing a runtime dependency) -./gradlew checkBinaryLicense - -# Always scope test runs with --tests — running a whole module's test task is slow. -# Run a single test class -./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" -# Run a single test method -./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest." -# Run all tests in a specific package -./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.admin.*" -``` - -### Test groups (TestNG) - -Tests use **TestNG** and are tagged with `@Test(groups = "...")`. By default the build **excludes the `quarantine` and `flaky` groups** (`excludedTestGroups` default = `quarantine,flaky`), so to run a single test that lives in one of those groups you must clear the exclusion: - -```bash -# Run a specific test that is in the flaky/quarantine group (otherwise excluded by default) -./gradlew :pulsar-broker:test -PexcludedTestGroups='' --tests "" -``` - -CI selects whole groups with `-PtestGroups=` and `-PexcludedTestGroups=` (e.g. `broker,broker-admin`); locally prefer `--tests` to scope to specific classes instead of running an entire group. CI splits `pulsar-broker` tests into groups (see `pulsar-build/run_unit_group_gradle.sh` and `gradle/verify-test-groups.gradle.kts`). Tests with no group are treated as `other` at runtime. `./gradlew verifyTestGroups` reports group assignments and flags tests not covered by any CI group. - -Other test-related properties: `-PtestJavaVersion=17` (run tests on a different JDK toolchain), `-PtestRetryCount=N`, `-PtestFailFast=true|false`, `-PprotobufVersion=4.31.1` (protobuf v4 compatibility tests). - -Failed tests are retried once by default (`testRetryCount=1`; `0` when running inside the IDE). When running tests locally, prefer **`-PtestRetryCount=0`** to catch failures (including flakiness) early instead of having retries mask them. - -### Integration tests - -Integration tests live in `tests/` (see `tests/README.md`). They use [Testcontainers](https://www.testcontainers.org/) to bring up Pulsar services in Docker, so **Docker must be installed and running**. Build the test image first, then run the tests. - -The full integration suite is heavy and slow. **In local development, always run individual integration tests** rather than the whole suite — pass `--tests` to select a class (TestNG then discovers it directly from the classpath): - -```bash -./gradlew :tests:latest-version-image:dockerBuild # build the docker test image -./gradlew :tests:integration:integrationTest --tests "org.apache.pulsar.tests.integration." -``` - -To run the **entire** integration test set, use **Personal CI** (see below) rather than running it locally. (`integrationTest` also accepts `-PtestGroups`/`-PexcludedTestGroups` and `-PintegrationTestSuiteFile=.xml` to pick a specific TestNG suite.) - -### Running the full CI pipeline (Personal CI) - -The full test suite is large and slow to run locally. While iterating on a change, run only the narrowly-scoped tests relevant to the change (a single test class or package, see above) rather than a module's entire test task. To validate a larger change against the **full** CI pipeline, do not run everything locally — use **Personal CI**, which runs Pulsar's CI workflows in the contributor's own GitHub fork. - -If Personal CI is not yet set up, **ask the user to follow** the [Personal CI documentation](https://pulsar.apache.org/contribute/personal-ci/) to enable it on their fork. Once it is set up, the agent can drive the loop: - -1. Keep the local `master` up-to-date with `apache/pulsar` and rebase the feature branch on it. -2. Push the feature branch to the **fork** to trigger CI runs there. CI runs against the PR opened in the user's own fork (it is normal to have a PR open in the fork *and* a PR for the same branch open in `apache/pulsar` at the same time). -3. Monitor CI status on the fork and fix failures. -4. Open the PR to `apache/pulsar` only after the fork's CI is green. - -Once the PR to `apache/pulsar` has been opened, stop rebasing as part of this loop: step 1's rebase no longer applies — bring in upstream changes by merging `apache/pulsar`'s `master` instead (see the Pull requests section below). - -## Architecture (big picture) - -Pulsar separates a stateless serving layer (brokers) from durable storage (Apache BookKeeper) and a metadata store (ZooKeeper/etcd/others). The modules layer accordingly: - -- **`pulsar-client-api`, `pulsar-client-admin-api`** — public, backward-compatible interfaces only. `pulsar-client-api-v5`/`pulsar-client-v5` are the newer V5 client API (PIP-466/468). -- **`pulsar-client` (`:pulsar-client-original`)** — the Java client implementation (producer/consumer/reader, connection pooling). `pulsar-client-admin` implements the admin REST client. -- **`pulsar-common`** — wire protocol and shared types. Protobuf / lightproto messages are **generated** into `generated-lightproto/` / `generated-sources/` (excluded from checkstyle and spotless). -- **`pulsar-metadata`** — pluggable metadata store abstraction (ZooKeeper, etcd, RocksDB, memory) used by broker and bookkeeper. -- **`managed-ledger`** — the storage abstraction over **Apache BookKeeper**: append-only ledgers + cursors that track consumer/subscription positions. This is the durability layer the broker reads/writes through. -- **`pulsar-broker`** — the server. `PulsarService` is the composition root wiring everything together; `BrokerService` manages topics, subscriptions, and client connections. Entry points: `PulsarBrokerStarter` (broker), `PulsarStandalone`/`PulsarStandaloneStarter` (all-in-one), `PulsarClusterMetadataSetup` (cluster init). -- **`pulsar-proxy`** — optional proxy/gateway in front of brokers. -- **`pulsar-functions/*`** — serverless compute (Functions): `proto`, `api-java`, `instance`, `runtime`, `worker`, `localrun`. -- **`pulsar-io/*`** — connector framework core only; most built-in connectors were moved to the separate `pulsar-connectors` repo (PIP-465). -- **`pulsar-transaction/*`** — transaction coordinator and common types. -- **`tiered-storage/*`, `offloaders/`** — offload ledger data to cloud/filesystem storage. -- **`pulsar-websocket`** — WebSocket-to-Pulsar bridge. **`pulsar-client-tools`** — the `pulsar-admin`/`pulsar-client` CLIs. -- **Shaded/distribution** — `pulsar-client-shaded`, `pulsar-client-all`, `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles server/shell/offloader tarballs. - -The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. PIP-463 = Maven→Gradle migration, PIP-466/468 = V5 client). `pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the relevant PIP for the rationale behind a non-trivial feature or architectural decision. - -## Coding conventions - -Full conventions for code review are in `.github/copilot-instructions.md`. Highlights to apply when writing code: - -- **Style**: 4-space indent, never tabs; always use curly braces (even single-line `if`); no `@author` Javadoc tags. Every `TODO` must reference a GitHub issue (`// TODO: https://github.com/apache/pulsar/issues/XXXX`). Checkstyle config: `buildtools/src/main/resources/pulsar/checkstyle.xml`. Lombok is enabled. -- **Async**: heavy use of `CompletableFuture`. Methods returning `CompletableFuture` must **not** throw synchronously — return `CompletableFuture.failedFuture(e)` (including for argument validation). Never block (`Thread.sleep`, `Future.get()`, `.join()`, blocking IO) on event-loop / async-execution threads. Avoid nested futures; flatten with `thenCompose`. Prefer `OrderedExecutor` for ordered async work. -- **Concurrency**: public classes should be thread-safe; annotate non-thread-safe ones with `@NotThreadSafe`. Give threads meaningful names. -- **Logging**: prefer the [slog](https://github.com/merlimat/slog) library (`io.github.merlimat.slog`); instantiate the logger with Lombok's **`@CustomLog`** annotation (wired in `lombok.config` to `io.github.merlimat.slog.Logger.get(TYPE)`). SLF4J is still available but is **considered deprecated** for Pulsar logging — don't add new SLF4J loggers. Never use `System.out`/`System.err`. Assume INFO in production. **Default new logs to TRACE or DEBUG, not INFO** — existing Pulsar code overuses INFO, which makes production logs excessively noisy. Use **TRACE** for fine-grained/low-level detail, **DEBUG** for diagnostics that could reasonably be enabled in production without flooding the logs, and reserve **INFO** for significant lifecycle/state-change events that are low-frequency. With slog, attach data as **structured attributes** (`log.info().attr("topic", topic).log("Published")`) rather than interpolating it into the message string. For expensive values, don't guard with `isDebugEnabled()`/`isTraceEnabled()` — instead pass a supplier lambda for the attribute value (`.attr("dump", () -> expensiveDump())`) or use the deferred form (`log.debug(e -> e.attr(...).log("msg"))`), both of which run only when the level is enabled. Avoid logging in hot paths and stack traces at INFO or lower. -- **Networking/memory**: prefer Netty `ByteBuf` over `ByteBuffer` on internal paths; always close streams/connections/executors/buffers. -- **Dependencies**: prefer existing libraries (Guava/Commons, FastUtil, JCTools, RoaringBitmap, Caffeine, Jackson, Netty, OpenTelemetry). New dependencies must be justified and have LICENSE/NOTICE updated (verify with `checkBinaryLicense`). -- **Backward compatibility**: do not break public APIs, client compatibility, wire protocol, or serialized/metadata formats. Servers must work with both older and newer clients. - -## Testing conventions - -- TestNG + Mockito. Prefer **AssertJ** assertions with descriptions over TestNG asserts; use **Awaitility** (with AssertJ) for async conditions instead of `sleep`-based timing. Add timeouts to prevent hangs. -- **No reflection to access private state in tests** (`WhiteboxImpl.getInternalState/setInternalState`, `setAccessible(true)`, etc.). Instead expose a **package-private** `@VisibleForTesting` accessor and place the test in the same package. New reflection-based test access should be flagged in review. -- Every feature/bugfix should include tests covering edge cases and failure scenarios; they must be deterministic. -- **Close/release resources in tests** — shut down executors/clients/services and release Netty `ByteBuf`s (and other ref-counted buffers) that the test allocates. - - A reported **`ByteBuf`/buffer leak** (Netty pooled-allocator leak detection, enabled via `-Dpulsar.allocator.pooled=true`) is a **real bug**: fix the root cause — the missing `release()` in the code under test — rather than working around it in the test or suppressing the detector. - - **Thread leaks** reported by `ThreadLeakDetectorListener` are currently **not a reliable signal**: it produces a high rate of false positives — notably with `SharedPulsarBaseTest`, and whenever `THREAD_LEAK_DETECTOR_WAIT_MILLIS` is not set high enough (≈`10000` is recommended) for asynchronous shutdown to complete before the check runs. That wait setting only takes effect with the **Gradle daemon disabled** (`--no-daemon`). Because of the false positives, do not rely on `ThreadLeakDetectorListener` alone to conclude a change is thread-leak-free — corroborate before treating a reported thread leak as a real bug. - -## Pull requests - -PRs must follow `.github/PULL_REQUEST_TEMPLATE.md`. PR titles follow the `[][] ` convention (e.g. `[fix][broker] ...`, `[improve][build] ...`) — refer to `.github/workflows/ci-semantic-pull-request.yml` for the valid `[type]` and `[scope]` prefixes, which are enforced by CI. The `` should be in imperative form, like a good commit message's subject line. - -**Never rebase a PR branch once the PR is opened in `apache/pulsar`.** Rebasing rewrites history and disrupts reviewers (it invalidates review comments and makes incremental diffs unreadable). To bring in upstream changes, instead fetch from the `apache/pulsar` remote and **merge** its `master` into the PR branch: - -```bash -git fetch # e.g. `upstream` or `apache` -git merge /master -``` - -(Rebasing onto an updated `master` is fine *before* the PR is opened — see the Personal CI loop above — but not after.) - -## Reporting security vulnerabilities - -Please refer to https://pulsar.apache.org/security/. See https://www.apache.org/security/ for the general ASF policy. - -For already-public CVEs where you want to check Pulsar's exposure, the right venue is a GitHub issue on apache/pulsar or a question on the dev@pulsar.apache.org mailing list. Before asking about an already-public CVE and whether it's already fixed in Pulsar, search PRs and issues with the CVE id at https://github.com/apache/pulsar/pulls (also check issues and closed PRs/issues). Pulsar project's supported versions are available at https://pulsar.apache.org/contribute/release-policy/. Users should upgrade to supported versions to receive security updates. +# Agent guide for Apache Pulsar + +Supplemental guidance for AI coding assistants (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, +and similar tools) working in this repository. Keep this file short — it is a **router**. The detail +lives in the human-facing docs and the task skills, which you should load **on demand** rather than +pulling everything into context up front. + +Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is +performance-critical, heavily asynchronous, and concurrency-sensitive. Prioritize **correctness, +thread safety, performance, maintainability, and backward compatibility**. + +## Canonical docs (read the one that fits the task) + +| Doc | Use for | +|-----|---------| +| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: build, lint, running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | +| [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the Gradle build infrastructure, the module-name-vs-directory gotcha, and the `pip/` proposals. | +| [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging (slog), dependencies, backward compatibility, testing, and the review checklist. | +| [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | + +The authoritative project documentation is at ; the files above and the +website remain the source of truth — this guide just layers AI-specific pointers on top. + +## Skills + +> **Before writing or modifying code in an area below, read the matching skill file first.** Each +> skill is more focused than the canonical docs and adds AI-specific guardrails; it tells you what to +> verify and points back into the docs above. Load it on demand to keep context small. + +Task-specific guidance lives under [`.agents/skills/`](.agents/skills/), each in its own directory +with a `SKILL.md`: + +| Skill | Use for | +|-------|---------| +| [`pulsar-build`](.agents/skills/pulsar-build/SKILL.md) | Editing the Gradle build — convention plugins under `build-logic/`, `settings.gradle.kts`, `gradle/libs.versions.toml`, dependency changes. Enforces configuration-cache / configure-on-demand compatibility and LICENSE/NOTICE upkeep. | +| [`pulsar-tests`](.agents/skills/pulsar-tests/SKILL.md) | Writing or running tests — TestNG groups, `--tests` scoping, AssertJ/Awaitility, no-reflection rule, buffer/thread leak detectors, retry count. | +| [`pulsar-pr-workflow`](.agents/skills/pulsar-pr-workflow/SKILL.md) | Branching, commits, semantic PR titles, PR descriptions, the Personal CI loop, and the never-rebase-after-open / merge-instead rule. | +| [`pulsar-security`](.agents/skills/pulsar-security/SKILL.md) | Reporting a vulnerability or checking Pulsar's exposure to an already-public CVE. | + +## Critical rules + +1. **Don't break backward compatibility** — public APIs, client compatibility, wire protocol, and + serialized/metadata formats. Servers must interoperate with older and newer clients. +2. **Never block on async/event-loop threads;** methods returning `CompletableFuture` must not throw + synchronously. See [`CODING.md`](CODING.md#asynchronous-programming). +3. **Logging:** prefer slog via `@CustomLog`; default new logs to `TRACE`/`DEBUG`, not `INFO`. +4. **Tests:** scope runs with `--tests`; no reflection into private state (use a `@VisibleForTesting` + package-private accessor); release buffers and resources. +5. **PRs:** semantic `[type][scope]` title; describe **motivation** and **modifications**; do not + rebase once the PR is open in `apache/pulsar` — merge upstream `master` instead. +6. **Security:** never disclose a vulnerability — or the security nature of a change — in a public + issue, PR, or commit. See [`SECURITY.md`](SECURITY.md) / + [`pulsar-security`](.agents/skills/pulsar-security/SKILL.md). + +## Where to ask + +- Dev mailing list: · Slack: +- Issues: diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000000000..cf83c8a45521f --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,91 @@ +# Architecture + +Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is +performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, +networking). The authoritative documentation lives at ; this file is a +map of the repository for contributors (and AI coding agents) who need to find their way around the +modules quickly. + +## Big picture + +Pulsar separates a **stateless serving layer** (brokers) from **durable storage** (Apache +BookKeeper) and a **metadata store** (ZooKeeper / etcd / others). The Gradle modules layer +accordingly: + +- **`pulsar-client-api`, `pulsar-client-admin-api`** — public, backward-compatible interfaces only. + `pulsar-client-api-v5` / `pulsar-client-v5` are the newer V5 client API (PIP-466/468). +- **`pulsar-client` (`:pulsar-client-original`)** — the Java client implementation + (producer/consumer/reader, connection pooling). `pulsar-client-admin` implements the admin REST + client. +- **`pulsar-common`** — wire protocol and shared types. Protobuf / lightproto messages are + **generated** into `generated-lightproto/` / `generated-sources/` (excluded from checkstyle and + spotless). +- **`pulsar-metadata`** — pluggable metadata store abstraction (ZooKeeper, etcd, RocksDB, memory) + used by broker and bookkeeper. +- **`managed-ledger`** — the storage abstraction over **Apache BookKeeper**: append-only ledgers + + cursors that track consumer/subscription positions. This is the durability layer the broker reads + and writes through. +- **`pulsar-broker`** — the server. `PulsarService` is the composition root wiring everything + together; `BrokerService` manages topics, subscriptions, and client connections. Entry points: + `PulsarBrokerStarter` (broker), `PulsarStandalone` / `PulsarStandaloneStarter` (all-in-one), + `PulsarClusterMetadataSetup` (cluster init). +- **`pulsar-proxy`** — optional proxy/gateway in front of brokers. +- **`pulsar-functions/*`** — serverless compute (Functions): `proto`, `api-java`, `instance`, + `runtime`, `worker`, `localrun`. +- **`pulsar-io/*`** — connector framework core only; most built-in connectors were moved to the + separate `pulsar-connectors` repo (PIP-465). +- **`pulsar-transaction/*`** — transaction coordinator and common types. +- **`tiered-storage/*`, `offloaders/`** — offload ledger data to cloud/filesystem storage. +- **`pulsar-websocket`** — WebSocket-to-Pulsar bridge. **`pulsar-client-tools`** — the + `pulsar-admin` / `pulsar-client` CLIs. +- **Shaded / distribution** — `pulsar-client-shaded`, `pulsar-client-all`, + `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles + server/shell/offloader tarballs. + +## Build infrastructure + +The build is **Gradle** (migrated from Maven via PIP-463; much existing tooling and docs elsewhere +still reference Maven). Use the wrapper `./gradlew` — no separate Gradle install needed. **JDK 21 or +25 is required to run the build** (`-PskipJavaVersionCheck` bypasses the check). Bytecode targets +Java 17 (`options.release = 17`). + +- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, + higher tiers build on lower ones). +- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, + `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. This is + where shared compile/test/dependency config lives — edit conventions here rather than duplicating + config across modules. +- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; + `libs.*` references in build scripts). +- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every + module. + +When changing the build, **follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)**. +The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and +**configure-on-demand** (`org.gradle.configureondemand=true`), so any task used by a commonly-run +task (`assemble`, `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, +`checkBinaryLicense`, `docker*`, etc.) **must be configuration-cache and configure-on-demand +compatible** — no reading of mutable state at execution time, no `Project` access in task actions, +use `Provider` / value sources instead. Verify with `--configuration-cache`. Tooling/one-off tasks +that are not part of these common flows (e.g. `verifyTestGroups`, dependency-report and ad-hoc +maintenance tasks) may be exempt. + +### Module name vs. directory name gotcha + +Several Gradle project paths do **not** match their directory because the Maven artifactId is +preserved. Most importantly: + +- Directory `pulsar-client/` → project **`:pulsar-client-original`** +- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** +- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` + +Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. +Check `settings.gradle.kts` when a path is ambiguous. + +## Pulsar Improvement Proposals (`pip/`) + +The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design +documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. +PIP-463 = Maven→Gradle migration, PIP-465 = IO connectors moved out, PIP-466/468 = V5 client). +`pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the +relevant PIP for the rationale behind a non-trivial feature or architectural decision. diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 1d80b15b084a3..0000000000000 --- a/CLAUDE.md +++ /dev/null @@ -1 +0,0 @@ -See @AGENTS.md for guidance on working with this repository. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000000000..47dc3e3d863cf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/CODING.md b/CODING.md new file mode 100644 index 0000000000000..83c94f2e28410 --- /dev/null +++ b/CODING.md @@ -0,0 +1,149 @@ +# Coding guidelines + +Apache Pulsar follows the Sun Java Coding Conventions with additional project-specific rules. The +codebase is performance-critical, asynchronous, and concurrency-sensitive, so code review +prioritizes **correctness, thread safety, performance, maintainability, and backward +compatibility**. This file is the canonical coding reference for both human contributors and AI +coding agents; the task skills under [`.agents/skills/`](.agents/skills/) layer AI-specific +guardrails on top of it. + +## Style + +- **4 spaces** for indentation; **tabs must never be used**. +- Always use **curly braces**, even for single-line `if` statements. +- Do **not** include `@author` tags in Javadoc. +- Every `TODO` must reference a GitHub issue, e.g. `// TODO: https://github.com/apache/pulsar/issues/XXXX`. +- Checkstyle config: `buildtools/src/main/resources/pulsar/checkstyle.xml`. Lombok is enabled. + +## Asynchronous programming + +Pulsar relies heavily on `CompletableFuture` and asynchronous pipelines. Prefer `CompletableFuture` +APIs over `ListenableFuture` for new code. + +- **Methods returning `CompletableFuture` must not throw synchronously.** Propagate failures through + the returned future — `return CompletableFuture.failedFuture(e);` — including for argument + validation: + + ```java + if (arg == null) { + return CompletableFuture.failedFuture(new IllegalArgumentException("arg")); + } + ``` + + Throwing inside an async stage (`thenApply`, `thenCompose`, `thenRun`, `handle`, `whenComplete`) + is fine, because the framework routes it into the resulting future. +- **Never block on event-loop / async-execution threads** — no `Thread.sleep`, `Future.get()`, + `CompletableFuture.join()`, or blocking IO. +- **Avoid nested futures** such as `CompletableFuture>`; flatten with + `thenCompose`. +- Prefer **`OrderedExecutor`** for ordered asynchronous work. + +## Concurrency + +- Public classes should be **thread-safe**; annotate non-thread-safe ones with `@NotThreadSafe`. +- Protect shared mutable state; prefer fine-grained synchronization; ensure mutations occur on the + intended thread. +- Give threads **meaningful names** for diagnostics. + +## Logging + +- Prefer the **[slog](https://github.com/merlimat/slog)** library (`io.github.merlimat.slog`). + Instantiate the logger with Lombok's **`@CustomLog`** annotation, which is wired in `lombok.config` + to create an `io.github.merlimat.slog.Logger` (`io.github.merlimat.slog.Logger.get(TYPE)`). +- **SLF4J** is still available but is **considered deprecated** for Pulsar logging — do not introduce + new SLF4J loggers. +- Never use `System.out` / `System.err`. +- **Default new log statements to `TRACE` or `DEBUG`, not `INFO`.** Existing Pulsar code overuses + `INFO`, which makes production logs excessively noisy. Use `TRACE` for fine-grained/low-level + detail, `DEBUG` for diagnostics that could reasonably be enabled in production without flooding the + logs, and reserve `INFO` for low-frequency, significant lifecycle/state-change events. +- Attach data as **structured attributes** — `log.info().attr("topic", topic).log("Published")` — + rather than interpolating it into the message string. +- For expensive `DEBUG`/`TRACE` values, do **not** guard with `isDebugEnabled()` / `isTraceEnabled()`. + Use slog's lazy evaluation, which only runs when the level is enabled: + - pass a supplier lambda as the attribute value — `log.debug().attr("dump", () -> expensiveDump()).log("...")`, or + - use the deferred event form — `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. +- Avoid logging in hot paths and stack traces at `INFO` or lower. + +## Resource and memory management + +- Always close resources: streams, network connections, executors, buffers. Prefer + try-with-resources. +- For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an + external API requires `ByteBuffer`. Release ref-counted buffers you allocate. + +## Configuration + +When adding configuration options: use clear, descriptive names; provide sensible defaults; update +the default configuration files; and document the option. + +## Dependencies + +Prefer existing dependencies over introducing new libraries. Pulsar commonly uses Apache +Commons / Guava (utilities), **FastUtil** (type-specific collections), **JCTools** (concurrent data +structures), **RoaringBitmap** (compressed bitsets), **Caffeine** (caching), **Jackson** (JSON), +Prometheus / **OpenTelemetry** (metrics), and **Netty** (networking and buffers). + +A new dependency must be justified (explain why existing ones are insufficient) and must update the +bundled-dependency `LICENSE`/`NOTICE` — verify with `./gradlew checkBinaryLicense`. + +## Backward compatibility + +Pulsar maintains strong compatibility guarantees. Changes must not break public APIs, client +compatibility, wire-protocol compatibility, or serialized/metadata formats. Servers must work with +both older and newer clients. Flag any change that may break compatibility. + +## Testing conventions + +- **TestNG + Mockito.** Prefer **AssertJ** assertions with descriptions over TestNG asserts; use + **Awaitility** (with AssertJ) for asynchronous conditions instead of `sleep`-based timing. Add + timeouts to prevent hangs. +- Every feature or bug fix should include tests covering edge cases and failure scenarios; tests must + be deterministic and stable. Integration tests may be required for distributed components. +- **No reflection to access private state in tests** (`WhiteboxImpl.getInternalState` / + `setInternalState`, `Field.setAccessible(true)`, `Method.setAccessible(true)`, and similar). + Instead expose what the test needs through a **package-private** `@VisibleForTesting` accessor and + place the test in the same package: + + ```java + @VisibleForTesting + Map getSubscriptions() { + return subscriptions; + } + ``` + + New reflection-based test access should be flagged in review. See the dev@ thread + [Stop using reflection to access private fields in tests](https://lists.apache.org/thread/7gr04sqmzyttx4ln6ydtp3qv0xgo1o6m). +- **Close/release resources in tests** — shut down executors/clients/services and release Netty + `ByteBuf`s (and other reference-counted buffers) the test allocates. + - A reported **`ByteBuf`/buffer leak** (Netty pooled-allocator leak detection, enabled via + `-Dpulsar.allocator.pooled=true`) is a **real bug**: fix the root cause — the missing `release()` + in the code under test — rather than working around it in the test or suppressing the detector. + - **Thread leaks** reported by `ThreadLeakDetectorListener` are currently **not a reliable signal**. + The detector produces a high rate of false positives — notably with `SharedPulsarBaseTest`, and + whenever the `THREAD_LEAK_DETECTOR_WAIT_MILLIS` environment value is not set high enough + (≈`10000` is recommended) to let asynchronous shutdown complete before the check runs. That wait + setting only takes effect when the **Gradle daemon is disabled** (`--no-daemon`). Because of these + false positives, do not rely on `ThreadLeakDetectorListener` alone to conclude that a change is + thread-leak-free; corroborate before treating a reported thread leak as a real bug. + +See [`CONTRIBUTING.md`](CONTRIBUTING.md) for how to *run* tests (test groups, `--tests` scoping, +integration tests, retry count). + +## Code review checklist + +When reviewing a pull request, verify: + +- Java coding conventions are followed. +- Thread-safety risks; no blocking operations in async paths; correct `CompletableFuture` usage. +- No unnecessary dependencies; LICENSE/NOTICE updated when dependencies change. +- Logging follows the guidelines above (slog, levels, structured attributes). +- Backward compatibility is preserved. +- Tests exist and are appropriate; reflection-based access to private state is flagged with a + `@VisibleForTesting` package-private accessor suggested instead. +- The **PR description adequately explains the change** — at minimum the **Motivation (why?)** and + **Modifications (what / how?)**, matching `.github/PULL_REQUEST_TEMPLATE.md`. A title alone, or a + description that only restates the title, is not sufficient; ask for a proper description unless it + is already covered. + +Focus feedback on correctness, reliability, and maintainability. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index db83924c47c8a..47045a3c9f128 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -19,6 +19,161 @@ --> -## Contributing to Apache Pulsar +# Contributing to Apache Pulsar -We would love for you to contribute to Apache Pulsar and make it even better! Please check the [Apache Pulsar Contributing Guide](https://pulsar.apache.org/contribute/) before starting to work on the project. +We would love for you to contribute to Apache Pulsar and make it even better! The **authoritative** +contributor guide is the [Apache Pulsar Contributing Guide](https://pulsar.apache.org/contribute/) — +please read it before starting. This file is a quick, in-repo reference for the local development +workflow (build, test, PR, CI); for the big-picture module map see [`ARCHITECTURE.md`](ARCHITECTURE.md) +and for code style see [`CODING.md`](CODING.md). + +## Prerequisites + +- **JDK 21 or 25** is required to build (`master`). Bytecode targets Java 17. `zip` is also needed. + (`-PskipJavaVersionCheck` bypasses the JDK version check.) +- Use the bundled Gradle wrapper `./gradlew` (Linux/macOS) or `gradlew.bat` (Windows) — no separate + Gradle install is needed. See the + [build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and + [IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). + +## Build + +```bash +# Compile and assemble everything (or a single module) +./gradlew assemble +./gradlew :pulsar-broker:assemble + +# Lint / verify (license headers, formatting, checkstyle) — run before pushing +./gradlew rat spotlessCheck checkstyleMain checkstyleTest +./gradlew spotlessApply # auto-fix license headers/formatting + +# Verify bundled-dependency LICENSE/NOTICE coverage (run after changing a runtime dependency) +./gradlew checkBinaryLicense + +# Start a standalone Pulsar service (broker + bookie + metadata in one JVM) +bin/pulsar standalone + +# Build docker images apachepulsar/pulsar(-all):latest +./gradlew docker # or docker-all +``` + +## Running tests + +```bash +# Always scope test runs with --tests — running a whole module's test task is slow. +# Run a single test class +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" +# Run a single test method +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest." +# Run all tests in a specific package +./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.admin.*" +``` + +> Note the [module-name-vs-directory gotcha](ARCHITECTURE.md#module-name-vs-directory-name-gotcha): +> directory `pulsar-client/` is the Gradle project `:pulsar-client-original`. + +### Test groups (TestNG) + +Tests use **TestNG** and are tagged with `@Test(groups = "...")`. By default the build **excludes the +`quarantine` and `flaky` groups** (`excludedTestGroups` default = `quarantine,flaky`), so to run a +single test that lives in one of those groups you must clear the exclusion: + +```bash +# Run a specific test that is in the flaky/quarantine group (otherwise excluded by default) +./gradlew :pulsar-broker:test -PexcludedTestGroups='' --tests "" +``` + +CI selects whole groups with `-PtestGroups=` and `-PexcludedTestGroups=` (e.g. +`broker,broker-admin`); locally prefer `--tests` to scope to specific classes instead of running an +entire group. CI splits `pulsar-broker` tests into groups (see +`pulsar-build/run_unit_group_gradle.sh` and `gradle/verify-test-groups.gradle.kts`). Tests with no +group are treated as `other` at runtime. `./gradlew verifyTestGroups` reports group assignments and +flags tests not covered by any CI group. + +Other test-related properties: `-PtestJavaVersion=17` (run tests on a different JDK toolchain), +`-PtestRetryCount=N`, `-PtestFailFast=true|false`, `-PprotobufVersion=4.31.1` (protobuf v4 +compatibility tests). + +Failed tests are retried once by default (`testRetryCount=1`; `0` when running inside the IDE). When +running tests locally, prefer **`-PtestRetryCount=0`** to catch failures (including flakiness) early +instead of having retries mask them. + +### Integration tests + +Integration tests live in `tests/` (see `tests/README.md`). They use +[Testcontainers](https://www.testcontainers.org/) to bring up Pulsar services in Docker, so **Docker +must be installed and running**. Build the test image first, then run the tests. + +The full integration suite is heavy and slow. **In local development, always run individual +integration tests** rather than the whole suite — pass `--tests` to select a class (TestNG then +discovers it directly from the classpath): + +```bash +./gradlew :tests:latest-version-image:dockerBuild # build the docker test image +./gradlew :tests:integration:integrationTest --tests "org.apache.pulsar.tests.integration." +``` + +To run the **entire** integration test set, use **Personal CI** (below) rather than running it +locally. (`integrationTest` also accepts `-PtestGroups` / `-PexcludedTestGroups` and +`-PintegrationTestSuiteFile=.xml` to pick a specific TestNG suite.) + +### Running the full CI pipeline (Personal CI) + +The full test suite is large and slow to run locally. While iterating on a change, run only the +narrowly-scoped tests relevant to the change (a single test class or package, see above) rather than +a module's entire test task. To validate a larger change against the **full** CI pipeline, do not run +everything locally — use **Personal CI**, which runs Pulsar's CI workflows in the contributor's own +GitHub fork. + +If Personal CI is not yet set up, follow the +[Personal CI documentation](https://pulsar.apache.org/contribute/personal-ci/) to enable it on your +fork. Once it is set up, the loop is: + +1. Keep the local `master` up-to-date with `apache/pulsar` and rebase the feature branch on it. +2. Push the feature branch to the **fork** to trigger CI runs there. CI runs against the PR opened in + your own fork (it is normal to have a PR open in the fork *and* a PR for the same branch open in + `apache/pulsar` at the same time). +3. Monitor CI status on the fork and fix failures. +4. Open the PR to `apache/pulsar` only after the fork's CI is green. + +Once the PR to `apache/pulsar` has been opened, stop rebasing as part of this loop: step 1's rebase +no longer applies — bring in upstream changes by merging instead (see [Pull requests](#pull-requests)). + +## Pull requests + +PRs must follow `.github/PULL_REQUEST_TEMPLATE.md`. PR titles follow the +`[][] ` convention (e.g. `[fix][broker] ...`, +`[improve][build] ...`) — refer to `.github/workflows/ci-semantic-pull-request.yml` for the valid +`[type]` and `[scope]` prefixes, which are enforced by CI. The `` should be in +imperative form, like a good commit message's subject line. + +**Describe the change.** The PR description must cover, at minimum, the **Motivation (why?)** and the +**Modifications (what / how?)** — these map to the corresponding sections of the PR template. A title +alone, or a description that only restates the title, is not sufficient. + +**Never rebase a PR branch once the PR is opened in `apache/pulsar`.** Rebasing rewrites history and +disrupts reviewers (it invalidates review comments and makes incremental diffs unreadable). To bring +in upstream changes, instead fetch from the `apache/pulsar` remote and **merge** its `master` into the +PR branch: + +```bash +git fetch # e.g. `upstream` or `apache` +git merge /master +``` + +(Rebasing onto an updated `master` is fine *before* the PR is opened — see the Personal CI loop above +— but not after.) + +## Reporting security vulnerabilities + +See [`SECURITY.md`](SECURITY.md) and . In short: report a +suspected vulnerability **privately** (never in a public issue, PR, or commit), and never reveal the +security nature of a change in public until it is announced. An **already-public** CVE that you only +want to check Pulsar's exposure to is *not* a private disclosure — search the CVE id in apache/pulsar +PRs/issues first, then ask via a GitHub issue or dev@pulsar.apache.org. + +## AI coding agents + +If you use an AI coding assistant (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, …), see +[`AGENTS.md`](AGENTS.md) for an index of the agent-facing guidance and the task-specific skills under +[`.agents/skills/`](.agents/skills/). diff --git a/SECURITY.md b/SECURITY.md index acd8c0140b014..4c87d8c31a968 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1,3 +1,44 @@ # Security Policy -See also https://pulsar.apache.org/security/. +The authoritative policy is at ; the Apache Software Foundation's +general process is at . The summary below is what contributors (and +any AI tooling acting on their behalf) must follow. + +## Reporting a vulnerability + +Do **not** open a public GitHub issue, pull request, or discussion for a suspected vulnerability — +that defeats coordinated disclosure. + +Report it privately by email to the Apache Security Team at **security@apache.org** (the ASF's central +security address). You may also, or instead, write to the Apache Pulsar PMC's private list, +**private@pulsar.apache.org**. + +See and + for more detail. + +## Disclosure hygiene for contributors + +Until a fix has been publicly announced, do **not** reveal the security nature of a change anywhere +public — commit messages, pull request titles or bodies, GitHub issues, or review comments — even +when the fix touches security-adjacent code (authentication/authorization, deserialization, TLS, +networking). Describe the behaviour change neutrally. A public commit or PR that advertises "fixes the +CVE", "security fix", or "patches the vulnerability" discloses the issue before it is announced and +defeats the coordinated-disclosure process above. + +This applies to every contributor, and identically to any AI tooling acting on a contributor's behalf. + +## Checking exposure to an already-public CVE + +For a CVE that is **already public** (for example, in a dependency) and you want to check Pulsar's +exposure or whether it is already fixed, this is **not** a private disclosure — the right venue is a +GitHub issue on apache/pulsar or a question on the **dev@pulsar.apache.org** mailing list. + +Before asking, search PRs and issues with the CVE id at +(also check issues and **closed** PRs/issues) — the fix may already be merged or the question already +answered. + +## Supported versions + +Pulsar's supported versions are listed at . +Security fixes are made to supported versions; users should upgrade to a supported version to receive +security updates. From 149014c92bf0d056a22f0f0a478845536066e70f Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:33:17 +0300 Subject: [PATCH 05/39] [improve][misc] Add Closes alternative for Fixes and align PR template with the agent docs --- .agents/skills/pulsar-pr-workflow/SKILL.md | 5 +++-- .github/PULL_REQUEST_TEMPLATE.md | 18 +++++++++++++----- CONTRIBUTING.md | 4 +++- 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index 4021914db0d5a..5a6ce8c4fb27a 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -37,8 +37,9 @@ Personal CI loop on a fork. enforced by `.github/workflows/ci-semantic-pull-request.yml` — read it rather than guessing. `` is imperative, like a commit subject. - **PR description must cover Motivation (why?) and Modifications (what/how?)** per the template — a - title alone, or a description restating the title, is not enough. Add `Fixes #N` / `Main Issue: #N` - / `PIP: #N` when applicable. + title alone, or a description restating the title, is not enough. Link the issue when applicable: + `Fixes #N` (or equivalently `Closes #N`) for an issue the PR resolves, `Main Issue: #N` for one task + of a larger issue, and `PIP: #N` for a proposal. - **Never rebase once the PR is open in `apache/pulsar`** — it invalidates review comments and incremental diffs. Bring in upstream changes by merging: diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 69281b4a03bc2..9772d92017363 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,18 +1,26 @@ - + Fixes #xyz diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 47045a3c9f128..74140ebd2126d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -149,7 +149,9 @@ imperative form, like a good commit message's subject line. **Describe the change.** The PR description must cover, at minimum, the **Motivation (why?)** and the **Modifications (what / how?)** — these map to the corresponding sections of the PR template. A title -alone, or a description that only restates the title, is not sufficient. +alone, or a description that only restates the title, is not sufficient. Link the related issue with +`Fixes #N` (or equivalently `Closes #N`) for an issue the PR resolves, `Main Issue: #N` for one task +of a larger issue, or `PIP: #N` for a proposal. **Never rebase a PR branch once the PR is opened in `apache/pulsar`.** Rebasing rewrites history and disrupts reviewers (it invalidates review comments and makes incremental diffs unreadable). To bring From d8294ad00d95bd36d713abb8a6d9a27f116088c0 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:42:22 +0300 Subject: [PATCH 06/39] [improve][misc] Document Pulsar security model scope (trusted functions, perimeter security, no malicious-DoS protection) --- .agents/skills/pulsar-security/SKILL.md | 25 ++++++++++++++++++--- SECURITY.md | 29 +++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index a3b18dd23a2c7..0a0f5160a5991 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -1,6 +1,6 @@ --- name: pulsar-security -description: Guardrails for handling security topics in Apache Pulsar — reporting a vulnerability through the private ASF channels (never a public issue, PR, or commit), and checking whether Pulsar is exposed to an already-public CVE. Use when a task involves a suspected vulnerability, a CVE, security advisories, or questions about supported/patched versions. +description: Guardrails for handling security topics in Apache Pulsar — Pulsar's (informal) security model and threat scope, reporting a vulnerability through the private ASF channels (never a public issue, PR, or commit), and checking whether Pulsar is exposed to an already-public CVE. Use when a task involves a suspected vulnerability, a CVE, security advisories, deciding whether some behaviour (e.g. functions running code, or DoS) is in scope, or questions about supported/patched versions. license: Apache-2.0 compatibility: claude, codex, copilot, cursor, gemini, aider metadata: @@ -15,8 +15,27 @@ Guardrails for security-related work. Canonical policy: [`SECURITY.md`](../../.. ## When to use this skill -Use it when a task touches a suspected or reported vulnerability, a CVE, a security advisory, or -questions about which Pulsar versions are supported/patched. +Use it when a task touches a suspected or reported vulnerability, a CVE, a security advisory, +deciding whether some behaviour is in scope, or questions about which Pulsar versions are +supported/patched. + +## Security model (scope) + +Pulsar's security model is not formally defined. Two assumptions decide whether a report is in scope +(full text in [`SECURITY.md` → Security model and scope](../../../SECURITY.md#security-model-and-scope)): + +- **Functions and connectors run fully trusted code.** The function instance runtime's purpose is to + execute user-provided code — **RCE is the feature, not a bug** (the PMC repeatedly receives reports + to the contrary). The thread/process runtimes can touch any files/state they can access; the + Kubernetes runtime alone does not restrict cluster resource access; hardening hooks exist but the + hardening itself is not shipped. +- **Clusters assume network-perimeter security.** Only trusted users should reach the cluster. There + is **no protection against malicious DoS**; rate limits address only *unintentional* DoS + (misconfiguration, thundering herd). + +When triaging, do **not** classify "a trusted function executes/modifies its environment" or "a +perimeter-trusted client can overload the cluster" as a vulnerability by default — these are outside +the model. Surface uncertainty to a human (or the channels above) rather than asserting a CVE. ## Reporting a (non-public) vulnerability diff --git a/SECURITY.md b/SECURITY.md index 4c87d8c31a968..3a04ce3fb768a 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -4,6 +4,35 @@ The authoritative policy is at ; the Apache general process is at . The summary below is what contributors (and any AI tooling acting on their behalf) must follow. +## Security model and scope + +Pulsar's security model is not formally/explicitly defined. Two long-standing design assumptions +matter when deciding whether something is actually a vulnerability: + +**Pulsar Functions and connectors execute fully trusted code.** The function instance runtime exists +to run user-provided code — *remote code execution is its core purpose, not a flaw.* (The Pulsar PMC +has repeatedly received reports claiming that "the function instance runtime allows running +user-provided code and results in an RCE"; this is expected, by-design behaviour, not a security +issue.) The available execution models also let the code modify its environment: + +- The **thread** and **process** runtimes can read or modify any files and state accessible to the + process they run in. +- The **Kubernetes** runtime, on its own, does not restrict access to resources in the Kubernetes + cluster. The project provides hooks for custom Kubernetes-runtime *hardening*, but such hardening is + **not** part of the project. + +Therefore, Pulsar Functions and connectors must only run code that is **fully trusted**. + +**Clusters rely on network-perimeter security.** Pulsar is designed to be deployed behind a trusted +network perimeter where only trusted users can reach the cluster. The project does **not** implement +controls against malicious **denial-of-service (DoS)** attacks. Rate limiting exists to mitigate +*unintentional* DoS — e.g. improper configuration or thundering-herd effects — but it is **not** a +defense against a deliberate DoS by an attacker. + +Reports that amount to "a trusted function can run code / modify its environment" or "a +perimeter-trusted client can overload the cluster" generally fall **outside** Pulsar's threat model. +When in doubt, ask through the channels below rather than assuming. + ## Reporting a vulnerability Do **not** open a public GitHub issue, pull request, or discussion for a suspected vulnerability — From c78df015ea1414b288d9c45dc067f7d51fcd9c69 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:43:50 +0300 Subject: [PATCH 07/39] [improve][misc] Reword DoS phrasing in security model scope --- .agents/skills/pulsar-security/SKILL.md | 4 ++-- SECURITY.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index 0a0f5160a5991..5086c05c035ba 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -34,8 +34,8 @@ Pulsar's security model is not formally defined. Two assumptions decide whether (misconfiguration, thundering herd). When triaging, do **not** classify "a trusted function executes/modifies its environment" or "a -perimeter-trusted client can overload the cluster" as a vulnerability by default — these are outside -the model. Surface uncertainty to a human (or the channels above) rather than asserting a CVE. +perimeter-trusted client can cause a denial-of-service" as a vulnerability by default — these are +outside the model. Surface uncertainty to a human (or the channels above) rather than asserting a CVE. ## Reporting a (non-public) vulnerability diff --git a/SECURITY.md b/SECURITY.md index 3a04ce3fb768a..48f2a90a5360c 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -30,8 +30,8 @@ controls against malicious **denial-of-service (DoS)** attacks. Rate limiting ex defense against a deliberate DoS by an attacker. Reports that amount to "a trusted function can run code / modify its environment" or "a -perimeter-trusted client can overload the cluster" generally fall **outside** Pulsar's threat model. -When in doubt, ask through the channels below rather than assuming. +perimeter-trusted client can cause a denial-of-service" generally fall **outside** Pulsar's threat +model. When in doubt, ask through the channels below rather than assuming. ## Reporting a vulnerability From d410011805eb0f8fb2fd4948cba4a2c9c8d3ef31 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:47:40 +0300 Subject: [PATCH 08/39] [improve][misc] Link slog to its repository on mentions in agent docs --- AGENTS.md | 5 +++-- CODING.md | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index ef29c4d147de5..10723d8ade719 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,7 +15,7 @@ thread safety, performance, maintainability, and backward compatibility**. |-----|---------| | [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: build, lint, running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | | [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the Gradle build infrastructure, the module-name-vs-directory gotcha, and the `pip/` proposals. | -| [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging (slog), dependencies, backward compatibility, testing, and the review checklist. | +| [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging ([slog](https://github.com/merlimat/slog)), dependencies, backward compatibility, testing, and the review checklist. | | [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | The authoritative project documentation is at ; the files above and the @@ -43,7 +43,8 @@ with a `SKILL.md`: serialized/metadata formats. Servers must interoperate with older and newer clients. 2. **Never block on async/event-loop threads;** methods returning `CompletableFuture` must not throw synchronously. See [`CODING.md`](CODING.md#asynchronous-programming). -3. **Logging:** prefer slog via `@CustomLog`; default new logs to `TRACE`/`DEBUG`, not `INFO`. +3. **Logging:** prefer [slog](https://github.com/merlimat/slog) via `@CustomLog`; default new logs to + `TRACE`/`DEBUG`, not `INFO`. 4. **Tests:** scope runs with `--tests`; no reflection into private state (use a `@VisibleForTesting` package-private accessor); release buffers and resources. 5. **PRs:** semantic `[type][scope]` title; describe **motivation** and **modifications**; do not diff --git a/CODING.md b/CODING.md index 83c94f2e28410..1f5740193fc62 100644 --- a/CODING.md +++ b/CODING.md @@ -137,7 +137,7 @@ When reviewing a pull request, verify: - Java coding conventions are followed. - Thread-safety risks; no blocking operations in async paths; correct `CompletableFuture` usage. - No unnecessary dependencies; LICENSE/NOTICE updated when dependencies change. -- Logging follows the guidelines above (slog, levels, structured attributes). +- Logging follows the guidelines above ([slog](https://github.com/merlimat/slog), levels, structured attributes). - Backward compatibility is preserved. - Tests exist and are appropriate; reflection-based access to private state is flagged with a `@VisibleForTesting` package-private accessor suggested instead. From 728c326ab84637b95bf1924a0e98ce37e7a33480 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:50:20 +0300 Subject: [PATCH 09/39] [improve][misc] Require PMC go-ahead before pushing non-public security fixes --- .agents/skills/pulsar-security/SKILL.md | 6 +++++- SECURITY.md | 6 ++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index 5086c05c035ba..0fdd1ac3a9986 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -45,7 +45,11 @@ outside the model. Surface uncertainty to a human (or the channels above) rather `private@pulsar.apache.org`. Follow and . - When a change fixes a non-public security issue, **do not state its security nature** in the commit - message, PR title, or PR body. + message, PR title, or PR body — and **do not push, commit publicly, or open a PR for the fix until + the Apache Pulsar PMC has responded with a go-ahead.** The fix must be coordinated through the ASF + security vulnerability handling process + ([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)), + not pushed ahead of the coordinated disclosure timeline. ## Checking exposure to an already-public CVE diff --git a/SECURITY.md b/SECURITY.md index 48f2a90a5360c..1ad45916597d6 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -54,6 +54,12 @@ networking). Describe the behaviour change neutrally. A public commit or PR that CVE", "security fix", or "patches the vulnerability" discloses the issue before it is announced and defeats the coordinated-disclosure process above. +Moreover, a fix for a non-public security issue must **not** be pushed, committed publicly, or opened +as a PR until the Apache Pulsar PMC has responded with a go-ahead. The fix is coordinated through the +ASF security vulnerability handling process +([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)) +— it must not run ahead of the coordinated-disclosure timeline. + This applies to every contributor, and identically to any AI tooling acting on a contributor's behalf. ## Checking exposure to an already-public CVE From 0d514a095b08af79f7bfbfb8e49d2afe24a753a3 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:51:26 +0300 Subject: [PATCH 10/39] [improve][misc] Clarify project team commits security fixes; reporter shares patch privately --- .agents/skills/pulsar-security/SKILL.md | 11 ++++++----- SECURITY.md | 10 ++++++---- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index 0fdd1ac3a9986..a81d711b5efff 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -45,11 +45,12 @@ outside the model. Surface uncertainty to a human (or the channels above) rather `private@pulsar.apache.org`. Follow and . - When a change fixes a non-public security issue, **do not state its security nature** in the commit - message, PR title, or PR body — and **do not push, commit publicly, or open a PR for the fix until - the Apache Pulsar PMC has responded with a go-ahead.** The fix must be coordinated through the ASF - security vulnerability handling process - ([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)), - not pushed ahead of the coordinated disclosure timeline. + message, PR title, or PR body, and **do not push, commit publicly, or open a PR for the fix.** + **The project team commits the fix**, coordinated through the ASF security vulnerability handling + process + ([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). + A reporter may share a suggested fix patch privately by including it in the report to + `private@pulsar.apache.org` — never via a public commit or PR. ## Checking exposure to an already-public CVE diff --git a/SECURITY.md b/SECURITY.md index 1ad45916597d6..5099821fd2e55 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -55,10 +55,12 @@ CVE", "security fix", or "patches the vulnerability" discloses the issue before defeats the coordinated-disclosure process above. Moreover, a fix for a non-public security issue must **not** be pushed, committed publicly, or opened -as a PR until the Apache Pulsar PMC has responded with a go-ahead. The fix is coordinated through the -ASF security vulnerability handling process -([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)) -— it must not run ahead of the coordinated-disclosure timeline. +as a PR. **The project team commits the fix**, coordinated through the ASF security vulnerability +handling process +([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)), +so it does not run ahead of the coordinated-disclosure timeline. When reporting, you may include a +suggested fix patch privately in your report to `private@pulsar.apache.org` — never in a public +commit or PR. This applies to every contributor, and identically to any AI tooling acting on a contributor's behalf. From 1426a1994472196898002e400a899239b92448ad Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:57:08 +0300 Subject: [PATCH 11/39] [improve][misc] Clarify how the project team commits security fixes (public pre-release or private repo) --- .agents/skills/pulsar-security/SKILL.md | 17 ++++++++++------- SECURITY.md | 17 ++++++++++------- 2 files changed, 20 insertions(+), 14 deletions(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index a81d711b5efff..3313f1b0f2705 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -44,13 +44,16 @@ outside the model. Surface uncertainty to a human (or the channels above) rather - Report privately to the ASF Security Team (`security@apache.org`), optionally copying `private@pulsar.apache.org`. Follow and . -- When a change fixes a non-public security issue, **do not state its security nature** in the commit - message, PR title, or PR body, and **do not push, commit publicly, or open a PR for the fix.** - **The project team commits the fix**, coordinated through the ASF security vulnerability handling - process - ([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). - A reporter may share a suggested fix patch privately by including it in the report to - `private@pulsar.apache.org` — never via a public commit or PR. +- **Do not push, commit publicly, or open a PR for the fix yourself.** You may share a suggested fix + patch privately by including it in the report to `private@pulsar.apache.org` — never via a public + commit or PR. + +The **project team** commits the fix, coordinated through the ASF security vulnerability handling +process +([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). +The team may commit the fix to the public repository **before** the release, using a neutral commit +message that does not state its security nature. In severe cases, the commit and release are made in a +private repository, and the fix is made public only at the time of the release. ## Checking exposure to an already-public CVE diff --git a/SECURITY.md b/SECURITY.md index 5099821fd2e55..afa920f1d1ae9 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -54,16 +54,19 @@ networking). Describe the behaviour change neutrally. A public commit or PR that CVE", "security fix", or "patches the vulnerability" discloses the issue before it is announced and defeats the coordinated-disclosure process above. -Moreover, a fix for a non-public security issue must **not** be pushed, committed publicly, or opened -as a PR. **The project team commits the fix**, coordinated through the ASF security vulnerability -handling process -([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)), -so it does not run ahead of the coordinated-disclosure timeline. When reporting, you may include a -suggested fix patch privately in your report to `private@pulsar.apache.org` — never in a public -commit or PR. +Do **not** push, commit publicly, or open a PR for a fix to a non-public security issue. When +reporting, you may include a suggested fix patch privately in your report to +`private@pulsar.apache.org` — never in a public commit or PR. This applies to every contributor, and identically to any AI tooling acting on a contributor's behalf. +The **project team** commits the fix, coordinated through the ASF security vulnerability handling +process +([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). +The team may commit the fix to the public repository **before** the release, using a neutral commit +message that does not state its security nature. In severe cases, the commit and release are made in a +private repository, and the fix is made public only at the time of the release. + ## Checking exposure to an already-public CVE For a CVE that is **already public** (for example, in a dependency) and you want to check Pulsar's From d6c1316099cc19086ae82d5897ce8164d69f2b94 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 13:57:32 +0300 Subject: [PATCH 12/39] [improve][misc] Add GitHub Discussions to Where to ask --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 10723d8ade719..a8b38d657f962 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -56,4 +56,4 @@ with a `SKILL.md`: ## Where to ask - Dev mailing list: · Slack: -- Issues: +- Issues: · Discussions: From 1fb8d309f28af3816ad9c879858aa9f79e5c08e2 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 14:43:16 +0300 Subject: [PATCH 13/39] [improve][misc] Document Pulsar concurrency model, JMM rules, and performance/GC guidance Expand CODING.md's Concurrency section with the Java Memory Model rules that have historically tripped up Pulsar code: synchronization needs the same lock for reads and writes, fields shared across threads need volatile, immutable vs. effectively-immutable objects and safe publication/initialization, preferring DefaultThreadFactory/FastThreadLocalThread, and how to reproduce timing/platform-dependent bugs. Add a ZGC + Netty Recycler note (PIP-443) and a JMH-benchmark guideline (microbench/). Add a Concurrency-model gap and Backpressure (PIP-442) section to ARCHITECTURE.md, and point the pulsar-tests skill at the reproduction guidance. --- .agents/skills/pulsar-tests/SKILL.md | 4 ++ ARCHITECTURE.md | 31 +++++++++++ CODING.md | 77 +++++++++++++++++++++++++++- 3 files changed, 111 insertions(+), 1 deletion(-) diff --git a/.agents/skills/pulsar-tests/SKILL.md b/.agents/skills/pulsar-tests/SKILL.md index 3e0b6cc4c706a..f22e3863eaf23 100644 --- a/.agents/skills/pulsar-tests/SKILL.md +++ b/.agents/skills/pulsar-tests/SKILL.md @@ -49,6 +49,10 @@ failure / flake. Don't treat it as a real bug on its own; corroborate first. - **Don't fabricate assertions or test names** — verify symbols exist. - For a **bugfix**, add a regression test and confirm it fails before the fix and passes after. +- **Concurrency / memory-visibility bugs** are timing- and platform-dependent and easily masked — a + clean run is weak evidence a fix is correct. See + [`CODING.md` → Reproducing concurrency / memory-visibility bugs](../../../CODING.md#reproducing-concurrency--memory-visibility-bugs) + (JIT warm-up rounds, interpreted-vs-compiled differences, multi-socket/NUMA hardware). ## Validation checklist diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index cf83c8a45521f..9eabf50057e7d 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -42,6 +42,37 @@ accordingly: `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles server/shell/offloader tarballs. +## Concurrency model (a known gap) + +Pulsar does **not** have a clearly established, documented concurrency model, which makes it hard to +evaluate whether a given piece of code is correct by construction. (Contrast Netty, which has a clear +rule: all handling on the IO thread is non-blocking, which by extension means avoiding synchronization +and locks on that path.) Pulsar does not strictly follow such a rule; modern JVMs and hardware +optimize `synchronized` code well enough that this has not blocked high performance, but it does make +reasoning about correctness harder than it needs to be. + +Conventions that **should** be documented (and largely are not yet): + +- which work belongs on the network-connection **event loop** vs. other threads; +- how the various **thread pools** are intended to be used, and what kind of work belongs on each; +- how threads are expected to **hand off state** to each other; +- when a `CompletableFuture`'s **completion thread should be switched** to another thread, and which one; +- **concurrency limits** for asynchronous tasks; +- preferring the **single-writer principle** to avoid concurrent state mutation. + +Until such a model is written down, follow the surrounding code's conventions and the Java-Memory-Model +rules in [`CODING.md`](CODING.md#concurrency). Once a model is defined, it becomes far more tractable to +"lift and shift" existing code toward it and enforce the rules consistently rather than having each +contributor rediscover the conventions case by case. + +## Backpressure + +Closely tied to the concurrency model is **backpressure** — how the system avoids accepting more work +than it can handle, particularly with respect to memory. The memory side is described in +[PIP-442 "Existing Broker Memory Management"](pip/pip-442.md#existing-broker-memory-management). Broader +backpressure (beyond memory) is not yet documented and would benefit from being defined alongside the +concurrency model. + ## Build infrastructure The build is **Gradle** (migrated from Maven via PIP-463; much existing tooling and docs elsewhere diff --git a/CODING.md b/CODING.md index 1f5740193fc62..ea35490944fee 100644 --- a/CODING.md +++ b/CODING.md @@ -42,8 +42,72 @@ APIs over `ListenableFuture` for new code. - Public classes should be **thread-safe**; annotate non-thread-safe ones with `@NotThreadSafe`. - Protect shared mutable state; prefer fine-grained synchronization; ensure mutations occur on the - intended thread. + intended thread. Prefer the **single-writer principle** — design so a given piece of state is + mutated by only one thread — to avoid concurrent mutation entirely. - Give threads **meaningful names** for diagnostics. +- When creating thread pools, prefer Netty's **`io.netty.util.concurrent.DefaultThreadFactory`**. It + produces **`FastThreadLocalThread`** instances, on which `FastThreadLocal` lookups are much faster + than on a plain `Thread` — this matters on Netty-heavy paths (e.g. the pooled `ByteBuf` allocator and + other Netty internals that rely on `FastThreadLocal`) — and it also assigns meaningful, prefixed + thread names. + +Note that Pulsar does not yet have a documented, project-wide concurrency model; see +[`ARCHITECTURE.md` → Concurrency model](ARCHITECTURE.md#concurrency-model-a-known-gap) for the +conventions that *should* govern threads, thread pools, and event loops. + +### The Java Memory Model is what makes concurrent code correct + +Historically, several hard-to-investigate Pulsar bugs have come from misconceptions about Java +synchronization: + +- **A `synchronized` method or block is not, on its own, thread-safe.** `synchronized` provides its + visibility and ordering guarantees only when the **same monitor/lock guards both the reads and the + writes** of the shared state. Synchronizing on an arbitrary monitor while another thread accesses + the same field without that monitor guarantees nothing. +- On 64-bit JVMs a field's value is **never corrupted** — a read always returns some value that was + actually written. What goes wrong is **visibility**: without a happens-before relationship, + different threads can observe different values, or a thread may never see an updated value at all. + Establish happens-before with `synchronized`, `volatile`, `final`, or `java.util.concurrent` + constructs. +- **A field accessed by more than one thread needs explicit visibility.** If multiple threads read + and write a field, make it `volatile` (or guard every access — both reads and writes — with the + same lock); otherwise a writing thread's update may never become visible to a reading thread. + `volatile` gives visibility for that single field but does **not** make compound updates + (read-modify-write, check-then-act) atomic — use `java.util.concurrent` atomics/locks for those. +- Visibility is per-field, so a mutable object can be observed **partially updated** — some field + writes visible to a reader, others not. +- The only way to make concurrent code reliably correct is to **conform to the Java Memory Model**. + **Benign data races** are sometimes acceptable, and Pulsar has code that relies on this by design — + but that must be a deliberate, documented choice, not an accident. +- **Prefer immutable objects** to sidestep these visibility and mutation hazards. Two distinct notions: + - An object is **immutable** when all its fields are `final` *and* every nested instance it + references is itself immutable (a Java `record` is the common case). Immutability must hold for the + whole reachable graph — synchronization around a holder buys nothing if a referenced object can + still be mutated independently without its own synchronization. + - An object is **effectively immutable** when its state is never modified after construction but its + fields are not all `final`. +- **How they must be published differs:** + - An **immutable** object benefits from the JMM's final-field **safe initialization** guarantee — a + thread that observes the reference sees the fully-constructed state even when the reference was + published through a data race — so it needs **no** safe publication. + - An **effectively immutable** object does **not** get that guarantee and must be shared via **safe + publication**: a `final` or `volatile` field, or a `java.util.concurrent` construct (e.g. putting + it into a `ConcurrentHashMap`). + + See [Safe initialization](https://shipilev.net/blog/2014/safe-public-construction/#_safe_initialization). + +### Reproducing concurrency / memory-visibility bugs + +These bugs are timing- and platform-dependent and easily masked, so a clean run is weak evidence that +a concurrency fix is correct: + +- Interpreted and JIT-compiled code can behave very differently. Reproductions often need several + **warm-up rounds with a short pause** between them so the JIT compiler kicks in; the JIT is tiered, + asynchronous, and multi-threshold, so a short-running test may never trigger compilation. JVM flags + can force earlier compilation, and which execution paths are exercised affects what gets compiled. +- Some races surface only on specific **hardware/OS** combinations — classically **multi-socket / + multi-NUMA** machines, whose weaker cross-socket memory ordering exposes races that never appear on + a single-socket machine. ## Logging @@ -71,6 +135,13 @@ APIs over `ListenableFuture` for new code. try-with-resources. - For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an external API requires `ByteBuffer`. Release ref-counted buffers you allocate. +- **Don't hand-optimize object allocation away.** Pulsar is intended to run on **ZGC**, whose + collection overhead is very low, so the extra short-lived allocations from favouring immutable + objects (see *Concurrency* above) are cheap. Much older Pulsar code minimizes allocation by pooling + objects with Netty's `Recycler`; this is **no longer recommended for new code** — under ZGC the + `Recycler` often *costs* more CPU than it saves versus simply letting the GC reclaim the objects. + Do not introduce new `Recycler` usage. See + [PIP-443: Stop using Netty Recycler in new code](pip/pip-443.md). ## Configuration @@ -100,6 +171,10 @@ both older and newer clients. Flag any change that may break compatibility. timeouts to prevent hangs. - Every feature or bug fix should include tests covering edge cases and failure scenarios; tests must be deterministic and stable. Integration tests may be required for distributed components. +- **Validate performance optimizations with a benchmark.** Prefer adding a **JMH benchmark** under the + `microbench/` subproject that simulates a usage pattern realistic for Pulsar production, so the + optimization is backed by evidence rather than intuition. See `microbench/README.md` for how to run + JMH benchmarks. - **No reflection to access private state in tests** (`WhiteboxImpl.getInternalState` / `setInternalState`, `Field.setAccessible(true)`, `Method.setAccessible(true)`, and similar). Instead expose what the test needs through a **package-private** `@VisibleForTesting` accessor and From a33c57c420080f56d1466c34c736b5253326cd88 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 14:49:40 +0300 Subject: [PATCH 14/39] [improve][misc] Clarify no explicit protection against malicious DoS --- .agents/skills/pulsar-security/SKILL.md | 2 +- SECURITY.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index 3313f1b0f2705..c03b9d659fd64 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -30,7 +30,7 @@ Pulsar's security model is not formally defined. Two assumptions decide whether Kubernetes runtime alone does not restrict cluster resource access; hardening hooks exist but the hardening itself is not shipped. - **Clusters assume network-perimeter security.** Only trusted users should reach the cluster. There - is **no protection against malicious DoS**; rate limits address only *unintentional* DoS + is **no explicit protection against malicious DoS**; rate limits address only *unintentional* DoS (misconfiguration, thundering herd). When triaging, do **not** classify "a trusted function executes/modifies its environment" or "a diff --git a/SECURITY.md b/SECURITY.md index afa920f1d1ae9..b44f7193adb05 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -25,7 +25,7 @@ Therefore, Pulsar Functions and connectors must only run code that is **fully tr **Clusters rely on network-perimeter security.** Pulsar is designed to be deployed behind a trusted network perimeter where only trusted users can reach the cluster. The project does **not** implement -controls against malicious **denial-of-service (DoS)** attacks. Rate limiting exists to mitigate +explicit controls against malicious **denial-of-service (DoS)** attacks. Rate limiting exists to mitigate *unintentional* DoS — e.g. improper configuration or thundering-herd effects — but it is **not** a defense against a deliberate DoS by an attacker. From af132585f15ebc926b65d55a9289440a29803266 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 14:57:53 +0300 Subject: [PATCH 15/39] [improve][misc] Prefer named Java records over Pair returns and concatenated Map keys --- CODING.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/CODING.md b/CODING.md index ea35490944fee..78c47dca2e376 100644 --- a/CODING.md +++ b/CODING.md @@ -15,6 +15,18 @@ guardrails on top of it. - Every `TODO` must reference a GitHub issue, e.g. `// TODO: https://github.com/apache/pulsar/issues/XXXX`. - Checkstyle config: `buildtools/src/main/resources/pulsar/checkstyle.xml`. Lombok is enabled. +## Data types + +- **Don't return generic tuples.** Instead of returning `org.apache.commons.lang3.tuple.Pair` + (or a similar generic tuple type), define a small, purpose-named **Java `record`** inline in the + class that declares the method, with field names that document what the values mean. Give the record + the **same visibility as the method** that returns it — `public`, package-private (default), or + `private`. +- **Prefer record keys over concatenated strings.** For a composite `Map` key, use a small `record` + holding the key components instead of concatenating a `java.lang.String` (e.g. `a + ":" + b`). + A record key gives a correct `equals`/`hashCode`, keeps the key type-safe, and avoids + delimiter/escaping bugs. + ## Asynchronous programming Pulsar relies heavily on `CompletableFuture` and asynchronous pipelines. Prefer `CompletableFuture` From 8182e0398979ee7f0368c202b1c05a38a4f389ca Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 15:14:16 +0300 Subject: [PATCH 16/39] [improve][misc] Document /pulsarbot rerun for flaky CI and the fork-PR approval gate --- .agents/skills/pulsar-pr-workflow/SKILL.md | 5 +++++ CONTRIBUTING.md | 12 ++++++++++++ 2 files changed, 17 insertions(+) diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index 5a6ce8c4fb27a..f5dd86090a1ad 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -53,6 +53,11 @@ Personal CI loop on a fork. **fork** to trigger CI, fix failures, and open the PR to `apache/pulsar` only when the fork's CI is green. A PR can be open in both the fork and `apache/pulsar` at once. After the upstream PR is open, switch from rebase to merge (above). +- **Flaky CI:** Pulsar has many flaky tests, so GitHub Actions jobs on a PR can fail unrelated to the + change. When a failure looks like flakiness, comment **`/pulsarbot rerun`** to re-run the failed + workflows. Note that for fork PRs a maintainer must approve each run (again after new pushes), which + is why Personal CI is the better loop for iterating on genuine failures. Don't dismiss a failure as + "flaky" without checking it isn't caused by the change. - **Keep commits focused** — a fix, a refactor, and a formatting pass are separate commits/PRs. - **Security:** never describe the security nature of a change in a commit/PR — see [`pulsar-security`](../pulsar-security/SKILL.md). diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 74140ebd2126d..e8051a564f8eb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -139,6 +139,18 @@ fork. Once it is set up, the loop is: Once the PR to `apache/pulsar` has been opened, stop rebasing as part of this loop: step 1's rebase no longer applies — bring in upstream changes by merging instead (see [Pull requests](#pull-requests)). +### Retrying CI after flaky-test failures + +Pulsar has a large number of flaky tests, so GitHub Actions jobs on a PR sometimes fail for reasons +unrelated to the change. When a failure appears to be flakiness rather than a regression caused by the +PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed workflows. + +For a PR from a fork, a project maintainer must **approve** the workflow runs before they execute, and +approval is required again whenever new changes are pushed. This adds latency to each run and makes it +slow to tell flaky failures apart from genuine ones. Setting up **Personal CI** (above) sidesteps this +— the full pipeline runs in your own fork without maintainer approval — so it is especially useful when +a PR has legitimate test failures that you need to iterate on. + ## Pull requests PRs must follow `.github/PULL_REQUEST_TEMPLATE.md`. PR titles follow the From 9934370e30f19aa2c0219dedcd5a5bf3ba9d3828 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 15:14:45 +0300 Subject: [PATCH 17/39] [improve][misc] Note /pulsarbot rerun requires the previous run's jobs to finish first --- .agents/skills/pulsar-pr-workflow/SKILL.md | 7 ++++--- CONTRIBUTING.md | 4 +++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index f5dd86090a1ad..696a827667382 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -55,9 +55,10 @@ Personal CI loop on a fork. switch from rebase to merge (above). - **Flaky CI:** Pulsar has many flaky tests, so GitHub Actions jobs on a PR can fail unrelated to the change. When a failure looks like flakiness, comment **`/pulsarbot rerun`** to re-run the failed - workflows. Note that for fork PRs a maintainer must approve each run (again after new pushes), which - is why Personal CI is the better loop for iterating on genuine failures. Don't dismiss a failure as - "flaky" without checking it isn't caused by the change. + workflows — only **after all jobs from the previous run have completed** (it has no effect while + jobs are still running or queued). Note that for fork PRs a maintainer must approve each run (again + after new pushes), which is why Personal CI is the better loop for iterating on genuine failures. + Don't dismiss a failure as "flaky" without checking it isn't caused by the change. - **Keep commits focused** — a fix, a refactor, and a formatting pass are separate commits/PRs. - **Security:** never describe the security nature of a change in a commit/PR — see [`pulsar-security`](../pulsar-security/SKILL.md). diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e8051a564f8eb..de475f0222690 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -143,7 +143,9 @@ no longer applies — bring in upstream changes by merging instead (see [Pull re Pulsar has a large number of flaky tests, so GitHub Actions jobs on a PR sometimes fail for reasons unrelated to the change. When a failure appears to be flakiness rather than a regression caused by the -PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed workflows. +PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed workflows. `/pulsarbot rerun` only +takes effect once **all** jobs from the previous run have completed, so wait for the run to finish +(including any still-running or queued jobs) before commenting. For a PR from a fork, a project maintainer must **approve** the workflow runs before they execute, and approval is required again whenever new changes are pushed. This adds latency to each run and makes it From 75f2d8777b1217ac6164f1cea8821d3840e1d894 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 17:01:52 +0300 Subject: [PATCH 18/39] [improve][misc] Expand agent/contributor docs with conventions mined from PR review feedback Add recurring, generalizable guidance distilled from past apache/pulsar PR reviews: - CODING.md: data-type conventions (records, narrowest interface type, factory methods, minimize method/constructor params, builders incl. records, naming); async per-call-site evaluation + checkArgumentAsync; concurrency lock-scope; backward-compat for plugin/SPI interfaces (default methods, no third-party types in public APIs, opt-in behavior changes); Performance section (hot-path costs, no overhead under load, bounded caches/StringInterner); testing terminology (unit vs container integration tests), SharedPulsarBaseTest usage, integration-style vs unit-test design, JMH benchmarks, Awaitility. - CONTRIBUTING.md: focused PRs / no drive-by refactor or reformatting, large-refactor discussion on dev@, branches & backports, /pulsarbot rerun-failure-checks and flaky-test handling. - ARCHITECTURE.md: PIP-number reservation via dev@ thread. - AGENTS.md: "stay in scope" critical rule. - Skills (pulsar-build/tests/pr-workflow): matching guardrails. --- .agents/skills/pulsar-build/SKILL.md | 4 + .agents/skills/pulsar-pr-workflow/SKILL.md | 6 +- .agents/skills/pulsar-tests/SKILL.md | 15 +++ AGENTS.md | 3 + ARCHITECTURE.md | 4 +- CODING.md | 111 ++++++++++++++++++++- CONTRIBUTING.md | 40 +++++++- 7 files changed, 176 insertions(+), 7 deletions(-) diff --git a/.agents/skills/pulsar-build/SKILL.md b/.agents/skills/pulsar-build/SKILL.md index 51f4b682192a4..9712c38a6502e 100644 --- a/.agents/skills/pulsar-build/SKILL.md +++ b/.agents/skills/pulsar-build/SKILL.md @@ -54,6 +54,10 @@ this skill cites them and adds the guardrails below. `pulsar-client/` is project `:pulsar-client-original`). - **After any dependency change, run `./gradlew checkBinaryLicense`** and update the binary distribution `LICENSE`/`NOTICE` accordingly. Justify a genuinely new dependency. +- **Published modules must not depend on internal modules.** A public/published module cannot reference + internal-only projects in `api`/`implementation` (compile/runtime) scope — that artifact would be + unresolvable from Maven Central. Modules are **not** published by default; opt a public library in + via the `pulsar.public-java-library-conventions` plugin. - **Don't fabricate plugin/DSL/task names** — verify they exist in the build. ## Validation checklist diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index 696a827667382..1d33d59701112 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -59,7 +59,11 @@ Personal CI loop on a fork. jobs are still running or queued). Note that for fork PRs a maintainer must approve each run (again after new pushes), which is why Personal CI is the better loop for iterating on genuine failures. Don't dismiss a failure as "flaky" without checking it isn't caused by the change. -- **Keep commits focused** — a fix, a refactor, and a formatting pass are separate commits/PRs. +- **Keep commits and PRs focused** — a fix, a refactor, and a formatting pass are separate + commits/PRs. Don't bundle unrelated drive-by refactoring into a feature/bug-fix PR, and **don't + reformat unrelated files or lines** (whitespace, import reordering, re-wrapping) that your change + doesn't touch — it hides the real diff and pollutes `git blame`. Raise large refactorings on + `dev@pulsar.apache.org` first rather than opening (or auto-generating) sweeping refactor PRs. - **Security:** never describe the security nature of a change in a commit/PR — see [`pulsar-security`](../pulsar-security/SKILL.md). diff --git a/.agents/skills/pulsar-tests/SKILL.md b/.agents/skills/pulsar-tests/SKILL.md index f22e3863eaf23..52120599045d2 100644 --- a/.agents/skills/pulsar-tests/SKILL.md +++ b/.agents/skills/pulsar-tests/SKILL.md @@ -49,6 +49,21 @@ failure / flake. Don't treat it as a real bug on its own; corroborate first. - **Don't fabricate assertions or test names** — verify symbols exist. - For a **bugfix**, add a regression test and confirm it fails before the fix and passes after. +- **Most Pulsar "unit tests" are integration-style** (real in-JVM broker via + `MockedPulsarServiceBaseTest` / `pulsarTestContext`); the container-based integration tests are + separate (under `tests/`). When code isn't factored for isolation, prefer an integration-style test + over mocking a web of internal collaborators; inject faults via the test infrastructure (e.g. + `pulsarTestContext.getMockBookKeeper().setReadHandleInterceptor(...)`) and assert on logs with + `TestLogAppender`. But when you write or refactor code, prefer a design that allows a true isolated + unit test (collaborators behind narrow interfaces, light mocking) — excessive mocking is a design + smell, not the goal. The regression test must fail on the unpatched code for the *actual* reason — + not because it mutates internal state. Adding a clean new test class is fine when an existing one's + style/base class gets in the way. See + [`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions). +- **Prefer `SharedPulsarBaseTest` for new integration-style tests.** It shares one + `SharedPulsarCluster` for the test-JVM lifecycle and gives each test method its own namespace. Use + `getNamespace()` for the namespace and `newTopicName()` for topic names; never hardcode namespace or + topic names (the runtime is shared across tests). - **Concurrency / memory-visibility bugs** are timing- and platform-dependent and easily masked — a clean run is weak evidence a fix is correct. See [`CODING.md` → Reproducing concurrency / memory-visibility bugs](../../../CODING.md#reproducing-concurrency--memory-visibility-bugs) diff --git a/AGENTS.md b/AGENTS.md index a8b38d657f962..afe7722cc6709 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -52,6 +52,9 @@ with a `SKILL.md`: 6. **Security:** never disclose a vulnerability — or the security nature of a change — in a public issue, PR, or commit. See [`SECURITY.md`](SECURITY.md) / [`pulsar-security`](.agents/skills/pulsar-security/SKILL.md). +7. **Stay in scope.** Keep a change focused on its task; don't bundle unrelated drive-by refactors or + generate broad mass-refactoring PRs. Discuss large refactorings on `dev@pulsar.apache.org` first. + See [`CONTRIBUTING.md`](CONTRIBUTING.md#pull-requests). ## Where to ask diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 9eabf50057e7d..46d3c973539bd 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -119,4 +119,6 @@ The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. PIP-463 = Maven→Gradle migration, PIP-465 = IO connectors moved out, PIP-466/468 = V5 client). `pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the -relevant PIP for the rationale behind a non-trivial feature or architectural decision. +relevant PIP for the rationale behind a non-trivial feature or architectural decision. A PIP **number +is reserved by the first `dev@pulsar.apache.org` thread that uses it** — start the discussion to claim +the next free number. diff --git a/CODING.md b/CODING.md index 78c47dca2e376..acf7b5ad5cf82 100644 --- a/CODING.md +++ b/CODING.md @@ -26,6 +26,25 @@ guardrails on top of it. holding the key components instead of concatenating a `java.lang.String` (e.g. `a + ":" + b`). A record key gives a correct `equals`/`hashCode`, keeps the key type-safe, and avoids delimiter/escaping bugs. +- **Use the narrowest interface type** for fields, parameters, variables, and return types — `java.util.Map`, + `SequencedMap`, `SortedMap`, `Collection`, `List` — rather than a concrete implementation such as + `TreeMap`. Keep the concrete type only where its behaviour is actually required (e.g. instantiate a + `TreeMap` for stable key-ordered iteration) and still expose it through the interface type. +- **Minimize method and constructor parameters.** Don't pass values that are already reachable from a + context object a method receives — read them from it (e.g. from `PublishContext`) instead of adding + redundant parameters. For a constructor (or factory) with many parameters, use a **builder**: the + project uses Lombok-generated builders (`@Builder`) for most internal classes. `@Builder` also works + on a Java `record`, which pairs well with preferring records (above) — a record plus `@Builder` is a + clean way to pass many values without a long parameter list. **Don't use Lombok + `@Builder` on public client-API classes** — it's harder to maintain and its default builder class + name carries no meaningful context; hand-write the builder there, or if Lombok is used set an + explicit, meaningful `builderClassName`. +- **Name things for intent.** Name a boolean-returning method as a query (`shouldSkipChunk`, not + `skipChunk`); name methods, fields, and metrics after the effect they describe rather than vague + terms (prefer e.g. `truncated` / `migrationSegment` over `overflow` / `special`). +- **Construct via factory methods, not internal implementation types.** Use the provided factory + (e.g. `PositionFactory.create(...)`) instead of referencing an internal class such as + `ImmutablePositionImpl` directly. ## Asynchronous programming @@ -49,6 +68,13 @@ APIs over `ListenableFuture` for new code. - **Avoid nested futures** such as `CompletableFuture>`; flatten with `thenCompose`. - Prefer **`OrderedExecutor`** for ordered asynchronous work. +- If an operation performs IO (creating authentication, starting a transaction, etc.), it should + return a `CompletableFuture` rather than block. +- **Converting an existing synchronous-throwing method to return a failed future is not a mechanical + change** — evaluate each call site first. Some callers historically rely on the exception being + thrown *before* the async work starts, so a blanket conversion can change behaviour and introduce + instability in untested error paths. Use a shared helper (e.g. a `checkArgumentAsync` in `FutureUtil`) + to validate arguments without duplicating the try/catch in every method. ## Concurrency @@ -56,6 +82,10 @@ APIs over `ListenableFuture` for new code. - Protect shared mutable state; prefer fine-grained synchronization; ensure mutations occur on the intended thread. Prefer the **single-writer principle** — design so a given piece of state is mutated by only one thread — to avoid concurrent mutation entirely. +- **Minimize work while holding a lock.** Capture the state you need into local variables inside the + synchronized block, then run callbacks, listeners, and IO *outside* it. Never call out to + listener/callback code while holding a lock — narrowing lock scope has fixed real deadlocks and + contention in Pulsar. - Give threads **meaningful names** for diagnostics. - When creating thread pools, prefer Netty's **`io.netty.util.concurrent.DefaultThreadFactory`**. It produces **`FastThreadLocalThread`** instances, on which `FastThreadLocal` lookups are much faster @@ -155,6 +185,23 @@ a concurrency fix is correct: Do not introduce new `Recycler` usage. See [PIP-443: Stop using Netty Recycler in new code](pip/pip-443.md). +## Performance + +- **Back optimizations with evidence** — a JMH benchmark (see *Testing conventions*) or a profile — + not intuition; and measure on JIT-warmed code (see *Reproducing concurrency / memory-visibility + bugs* for why warm-up matters). +- **On hot paths** (dispatch, IO, per-message code): avoid `String.format` (build the string directly), + avoid `Enum.values()` (match values explicitly), and avoid unnecessary allocation and locking. Prefer + lock-free or single-writer designs over synchronization where practical. +- **Don't add overhead to an already-overloaded system.** Avoid doing work and then discarding it + (e.g. reading entries from BookKeeper only to drop them before dispatch); extra work under high load + causes cascading failures. Acquire or estimate up front and reconcile afterwards instead. +- **Bound in-memory caches** with a size or byte limit plus eviction, and de-duplicate repeated + `String`s (cluster / tenant / namespace ids) with `org.apache.pulsar.common.util.StringInterner` to + avoid heap duplication. +- Don't pre-emptively micro-optimize elsewhere — favour clear, correct code, and let ZGC handle + short-lived allocations (see *Resource and memory management*). + ## Configuration When adding configuration options: use clear, descriptive names; provide sensible defaults; update @@ -176,13 +223,75 @@ Pulsar maintains strong compatibility guarantees. Changes must not break public compatibility, wire-protocol compatibility, or serialized/metadata formats. Servers must work with both older and newer clients. Flag any change that may break compatibility. +**Plugin / SPI extension points are public API.** Pulsar has many pluggable interfaces selected by a +`*ClassName` configuration setting — e.g. `LoadManager`, `LedgerOffloaderFactory`, +`AuthorizationProvider` / `AuthenticationProvider`, `EntryFilter`, `TopicFactory`, `BrokerInterceptor`, +dispatcher / delayed-delivery-tracker factories, `CustomCommand`. Third parties ship implementations of +these in production. Changing such an interface — or a `protected` member of a class meant for +extension, such as `PulsarWebResource`, `PersistentTopic`, or `Producer` — breaks those implementations. +Treat these as public API: the change generally needs a PIP, and breaking changes must not land in +maintenance-branch backports (keep deprecated bridge methods that delegate to the new form instead). +When you must add a method to such an interface, give it a **`default` implementation** (e.g. one +throwing `UnsupportedOperationException`) so existing third-party implementors still compile — this +also keeps the change source-compatible enough to backport. + +**Don't leak third-party types through public or plugin interfaces.** Exposing library types such as +Netty or AsyncHttpClient classes in a public / plugin API breaks consumers of the **shaded** Pulsar +client (the shaded and unshaded classes differ) and couples callers to the implementation — provide a +Pulsar-owned abstraction instead. Changing a documented behaviour or guarantee (e.g. the +exclusive-producer guarantees of PIP-68, or default rate-limiter behaviour) likewise needs a PIP and a +dev@ discussion, not just a code change. + +**Introduce changes behind a backward-compatible default.** Prefer making new or changed behaviour +opt-in via configuration rather than silently changing what existing deployments do. Behaviour that can +cause data loss (e.g. skipping unrecoverable data) must be gated behind an explicit flag (such as +`autoSkipNonRecoverableData`), defaulting to the safe/old behaviour. + ## Testing conventions +Most of Pulsar's so-called **"unit tests"** (under each module's `src/test`, run with +`./gradlew ::test`) are in fact **integration-style** — they bring up a real in-JVM broker via +`MockedPulsarServiceBaseTest` / `pulsarTestContext` rather than testing a class in isolation. The +**actual integration tests** live under `tests/` and run against a Pulsar **Docker test image** with +Testcontainers (see [`CONTRIBUTING.md`](CONTRIBUTING.md#integration-tests)). "Unit test" below means +the former. + +Ideally this would not be necessary: well-designed code lets genuine **units be unit-tested in +isolation**, with collaborators behind narrow interfaces that can be substituted by simple +fakes/mocks **without excessive mocking**. Excessive mocking is a design smell (tight coupling), not a +reason to abandon unit testing. Much existing Pulsar code isn't factored this way, which is why +integration-style tests are the pragmatic default today — but when you write or refactor code, prefer +the design that makes a true, isolated unit test possible. + - **TestNG + Mockito.** Prefer **AssertJ** assertions with descriptions over TestNG asserts; use **Awaitility** (with AssertJ) for asynchronous conditions instead of `sleep`-based timing. Add - timeouts to prevent hangs. + timeouts to prevent hangs. Use Awaitility only when polling/retrying for a condition: + `untilAsserted(...)` retries assertions until they pass, `until(...)` waits for a boolean to become + `true` — don't swap the two. - Every feature or bug fix should include tests covering edge cases and failure scenarios; tests must be deterministic and stable. Integration tests may be required for distributed components. +- **For code that isn't factored for isolation, prefer an integration-style test over heavy mocking.** + Reproduce the real production scenario with `MockedPulsarServiceBaseTest` / `pulsarTestContext` + rather than mocking a web of internal collaborators. + Inject faults through the test infrastructure — e.g. + `pulsarTestContext.getMockBookKeeper().setReadHandleInterceptor(...)` to simulate read failures or + delays — and capture log output with `org.apache.pulsar.utils.TestLogAppender` to assert behaviour. + A bug-fix test should **fail against the unpatched code for the actual reason**, not because the test + forces internal state. +- **For new integration-style tests, prefer `SharedPulsarBaseTest`.** It shares a single + `SharedPulsarCluster` and its components for the whole **test-JVM lifecycle** (avoiding per-test + broker startup cost), while `admin` / `pulsarClient` are shared **per test class** (initialized once + per class). Each test method gets **its own namespace** (created before the method, force-deleted + after). Get that namespace with `getNamespace()` and create topic names with `newTopicName()` — + never hardcode namespace or topic names, because the runtime is shared across all tests in the JVM. + (The shared runtime is also why `ThreadLeakDetectorListener` mis-reports leaks with this base + class — see above.) +- **Don't mutate private/internal state to force a condition.** If a test has to reach into internal + fields to trigger the path under test, it usually isn't representative — drive the real path instead. +- It's fine to **add a new, cleanly-written test class** instead of extending an existing one whose + base class or style makes the new case hard to read; existing Pulsar tests are not always good models. +- For asynchronous interactions, verify with Mockito's `timeout(...)` (e.g. + `verify(x, timeout(2000)).foo()`) rather than fixed sleeps. - **Validate performance optimizations with a benchmark.** Prefer adding a **JMH benchmark** under the `microbench/` subproject that simulates a usage pattern realistic for Pulsar production, so the optimization is backed by evidence rather than intuition. See `microbench/README.md` for how to run diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index de475f0222690..f2de90b586438 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -59,6 +59,11 @@ bin/pulsar standalone ## Running tests +Most of these per-module "unit tests" are actually **integration-style** — they start a real in-JVM +broker (`MockedPulsarServiceBaseTest` / `pulsarTestContext`) rather than testing a class in isolation. +The **container-based integration tests** that run against a Pulsar Docker image are separate; see +[Integration tests](#integration-tests) below. + ```bash # Always scope test runs with --tests — running a whole module's test task is slow. # Run a single test class @@ -143,9 +148,10 @@ no longer applies — bring in upstream changes by merging instead (see [Pull re Pulsar has a large number of flaky tests, so GitHub Actions jobs on a PR sometimes fail for reasons unrelated to the change. When a failure appears to be flakiness rather than a regression caused by the -PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed workflows. `/pulsarbot rerun` only -takes effect once **all** jobs from the previous run have completed, so wait for the run to finish -(including any still-running or queued jobs) before commenting. +PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed workflows (or +**`/pulsarbot rerun-failure-checks`** to re-run only the failed jobs). It only takes effect once +**all** jobs from the previous run have completed, so wait for the run to finish (including any +still-running or queued jobs) before commenting. For a PR from a fork, a project maintainer must **approve** the workflow runs before they execute, and approval is required again whenever new changes are pushed. This adds latency to each run and makes it @@ -153,13 +159,31 @@ slow to tell flaky failures apart from genuine ones. Setting up **Personal CI** — the full pipeline runs in your own fork without maintainer approval — so it is especially useful when a PR has legitimate test failures that you need to iterate on. +If a flaky test that is unrelated to your change blocks your PR, you can move it to the `flaky` group +or disable it within your PR to unblock merging — and report it (search the +[existing flaky-test issues](https://github.com/apache/pulsar/issues?q=is%3Aissue+state%3Aopen+flaky) +first). Don't push empty "trigger CI" commits to force a rerun; use `/pulsarbot rerun` instead. + ## Pull requests PRs must follow `.github/PULL_REQUEST_TEMPLATE.md`. PR titles follow the `[][] ` convention (e.g. `[fix][broker] ...`, `[improve][build] ...`) — refer to `.github/workflows/ci-semantic-pull-request.yml` for the valid `[type]` and `[scope]` prefixes, which are enforced by CI. The `` should be in -imperative form, like a good commit message's subject line. +imperative form, like a good commit message's subject line, and **should not reference issue numbers** +(put `Fixes #N` / `Closes #N` in the description instead). + +**Keep each PR focused on one change.** Don't bundle unrelated drive-by refactoring (for example, +de-duplicating code you happened to touch) into a feature or bug-fix PR — it widens the review surface +and the risk; note such improvements as follow-ups instead. Likewise, **don't reformat unrelated files +or lines that aren't part of your change** (whitespace, import reordering, re-wrapping) — drive-by +formatting hides the real change in review and pollutes `git blame`. **Discuss large refactorings on +dev@pulsar.apache.org before investing effort**: there are many similar code duplications across +Pulsar, and a PR for each creates more maintainer burden than value. AI agents make it easy to generate +large volumes of refactoring PRs, and the project pushes back on these. Every change needs a real, +identifiable contributor who takes responsibility for it; unattributed AI-agent-style contributions — +especially larger ones from anonymous profiles or from people not actually using Pulsar — are typically +rejected. **Describe the change.** The PR description must cover, at minimum, the **Motivation (why?)** and the **Modifications (what / how?)** — these map to the corresponding sections of the PR template. A title @@ -180,6 +204,14 @@ git merge /master (Rebasing onto an updated `master` is fine *before* the PR is opened — see the Personal CI loop above — but not after.) +### Branches and backports + +Target `master` first. Once the change is merged, cherry-pick it to the relevant maintenance branches +(`branch-4.2`, `branch-4.1`, `branch-4.0`, …) as needed; bug and security fixes are backported per the +[release/support policy](https://pulsar.apache.org/contribute/release-policy/). New **features** are +**not** added to LTS / maintenance branches without a dev@pulsar.apache.org discussion (and usually a +PIP), to avoid regressions in stable releases. + ## Reporting security vulnerabilities See [`SECURITY.md`](SECURITY.md) and . In short: report a From 3dcff2dd1bf0ccaeaddbca8e423001a92b2ab6e0 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 17:10:35 +0300 Subject: [PATCH 19/39] [improve][misc] Slim down .agents skills and refine backport guidance Rewrite the four SKILL.md files as a lean, on-demand guardrail layer that cites the canonical docs (CODING/CONTRIBUTING/ARCHITECTURE/SECURITY) instead of duplicating their prose, to keep agent context small; trim frontmatter to name + description. Expand CONTRIBUTING.md backport guidance: maintainers handle backports, cherry-pick in merge order, dependent changes first, and drop branch-4.1 from the example. --- .agents/skills/pulsar-build/SKILL.md | 86 +++++++------------ .agents/skills/pulsar-pr-workflow/SKILL.md | 91 ++++++-------------- .agents/skills/pulsar-security/SKILL.md | 88 +++++--------------- .agents/skills/pulsar-tests/SKILL.md | 97 +++++++--------------- CONTRIBUTING.md | 9 +- 5 files changed, 110 insertions(+), 261 deletions(-) diff --git a/.agents/skills/pulsar-build/SKILL.md b/.agents/skills/pulsar-build/SKILL.md index 9712c38a6502e..01bb6cb130007 100644 --- a/.agents/skills/pulsar-build/SKILL.md +++ b/.agents/skills/pulsar-build/SKILL.md @@ -1,69 +1,39 @@ --- name: pulsar-build -description: AI-tooling guardrails for changes to the Apache Pulsar Gradle build. Points at the build conventions in ARCHITECTURE.md (tiered modules, convention plugins under build-logic/, the gradle/libs.versions.toml version catalog, the pulsar-dependencies enforced platform) and adds the AI-specific constraints on top — configuration-cache / configure-on-demand compatibility, no hard-coded versions, update LICENSE/NOTICE after dependency changes. Use when editing build.gradle.kts, settings.gradle.kts, build-logic/, gradle.properties, or gradle/libs.versions.toml. -license: Apache-2.0 -compatibility: claude, codex, copilot, cursor, gemini, aider -metadata: - audience: contributors to apache/pulsar - scope: ai-tooling-build-guardrails +description: AI guardrails for Gradle build changes (convention plugins, settings, version catalog, dependencies). Use when editing build-logic/, *.gradle.kts, gradle.properties, or gradle/libs.versions.toml. --- # Pulsar build -AI-tooling layer over the Gradle build. The conventions themselves live in -[`ARCHITECTURE.md`'s "Build infrastructure" section](../../../ARCHITECTURE.md#build-infrastructure); -this skill cites them and adds the guardrails below. +AI-tooling layer over [`ARCHITECTURE.md` → Build infrastructure](../../../ARCHITECTURE.md#build-infrastructure) +(tiers, convention plugins, version catalog, enforced platform, the module-name gotcha) and +[`CODING.md` → Dependencies](../../../CODING.md#dependencies). Load those for detail. -## When to use this skill +## When to use -**Use it for:** - -- Convention-plugin changes under `build-logic/conventions/` (`pulsar.java-conventions`, - `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, …). -- `settings.gradle.kts` (module wiring/tiers) and root or module `build.gradle.kts` edits. -- `gradle.properties`, `gradle/libs.versions.toml` (version catalog), or the `pulsar-dependencies` - enforced platform. -- Adding, removing, or upgrading a runtime or test dependency. -- Gradle wrapper bumps (`gradle/wrapper/gradle-wrapper.properties`). - -**Don't use it for:** production/test source changes — those are -[`pulsar-tests`](../pulsar-tests/SKILL.md) or the relevant module. - -## Read first - -- [`ARCHITECTURE.md` → Build infrastructure](../../../ARCHITECTURE.md#build-infrastructure) — tiers, - convention plugins, version catalog, enforced platform, the module-name gotcha. -- [`CODING.md` → Dependencies](../../../CODING.md#dependencies) — which libraries are preferred and - the LICENSE/NOTICE obligation. +Editing `build-logic/`, `settings.gradle.kts`, root/module `build.gradle.kts`, `gradle.properties`, +`gradle/libs.versions.toml`, the `pulsar-dependencies` platform, or the Gradle wrapper. (Source +changes belong to [`pulsar-tests`](../pulsar-tests/SKILL.md) or the module itself.) ## Guardrails -- **Edit shared config in the convention plugins, not per-module.** Compile/test/dependency defaults - belong in `build-logic/conventions/`; don't duplicate them across module build scripts. -- **Versions come from the catalog.** Reference `libs.*` / the `pulsar-dependencies` platform; never - hard-code a version string in a build script. Add or change versions in - `gradle/libs.versions.toml`. -- **Stay configuration-cache and configure-on-demand compatible.** Both are enabled - (`org.gradle.configuration-cache=true`, `org.gradle.configureondemand=true`). Any task on a - commonly-run path (`assemble`, `test`, `integrationTest`, `rat`/`spotlessCheck`/`checkstyle*`, - `checkBinaryLicense`, `docker*`) must not read mutable state at execution time or access `Project` - in task actions — use `Provider` / value sources. **Verify with `--configuration-cache`.** - Tooling/one-off tasks (e.g. `verifyTestGroups`, dependency reports) may be exempt. -- **Follow [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html).** -- **Use the real Gradle project path** (mind the module-name-vs-directory gotcha — directory - `pulsar-client/` is project `:pulsar-client-original`). -- **After any dependency change, run `./gradlew checkBinaryLicense`** and update the binary - distribution `LICENSE`/`NOTICE` accordingly. Justify a genuinely new dependency. -- **Published modules must not depend on internal modules.** A public/published module cannot reference - internal-only projects in `api`/`implementation` (compile/runtime) scope — that artifact would be - unresolvable from Maven Central. Modules are **not** published by default; opt a public library in - via the `pulsar.public-java-library-conventions` plugin. -- **Don't fabricate plugin/DSL/task names** — verify they exist in the build. - -## Validation checklist - -- [ ] `./gradlew help --configuration-cache` (and the affected task with `--configuration-cache`) - succeeds with no config-cache problems. -- [ ] `./gradlew assemble` (or the affected module's `assemble`) passes. -- [ ] `./gradlew rat spotlessCheck checkstyleMain checkstyleTest` passes. -- [ ] If dependencies changed: `./gradlew checkBinaryLicense` passes and LICENSE/NOTICE are updated. +- **Edit shared config in `build-logic/conventions/`**, not per-module. +- **Versions come from `gradle/libs.versions.toml`** (`libs.*` / `pulsar-dependencies`) — never + hardcode a version in a build script. +- **Keep tasks configuration-cache and configure-on-demand compatible** (both are enabled): no + mutable-state reads or `Project` access in task actions — use `Provider` / value sources. Verify + with `--configuration-cache`. One-off tooling tasks may be exempt. Follow + [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html). +- **Use the real Gradle project path** (`pulsar-client/` is `:pulsar-client-original`). +- **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution + LICENSE/NOTICE; justify any genuinely new dependency. +- **Published modules must not depend on internal modules** (compile/runtime) — the artifact would be + unresolvable from Maven Central. Modules aren't published unless they apply + `pulsar.public-java-library-conventions`. +- **Don't fabricate plugin / DSL / task names** — verify they exist. + +## Before you finish + +- [ ] The affected task and `./gradlew help` run clean with `--configuration-cache`. +- [ ] `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` pass. +- [ ] Dependency change → `checkBinaryLicense` passes and LICENSE/NOTICE are updated. diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index 1d33d59701112..02b1f220a803e 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -1,76 +1,39 @@ --- name: pulsar-pr-workflow -description: AI-tooling guardrails for branching, commits, and pull requests against apache/pulsar. Points at the PR conventions and Personal CI loop in CONTRIBUTING.md, then adds the AI-specific constraints — semantic [type][scope] titles validated against ci-semantic-pull-request.yml, PR descriptions that cover motivation and modifications, never rebase after the PR is opened (merge upstream master instead), and per-action user confirmation before pushing or opening a PR. Use when creating a branch, committing, opening or updating a PR, or running the Personal CI loop. -license: Apache-2.0 -compatibility: claude, codex, copilot, cursor, gemini, aider -metadata: - audience: contributors to apache/pulsar - scope: ai-tooling-pr-guardrails +description: AI guardrails for branching, commits, and PRs to apache/pulsar — semantic titles, PR descriptions, Personal CI, never-rebase-after-open, per-action confirmation. Use when committing, opening/updating a PR, or running CI. --- # Pulsar PR workflow -AI-tooling layer over the contribution workflow in -[`CONTRIBUTING.md` → Pull requests](../../../CONTRIBUTING.md#pull-requests) and -[Personal CI](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci). - -## When to use this skill - -Use it when creating a branch, committing, opening or updating a pull request, or driving the -Personal CI loop on a fork. - -## Read first - -- [`CONTRIBUTING.md` → Pull requests](../../../CONTRIBUTING.md#pull-requests) — title convention, - description requirement, rebase-vs-merge rule. -- [`CONTRIBUTING.md` → Personal CI](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci) - — the fork-based full-pipeline loop. -- `.github/PULL_REQUEST_TEMPLATE.md` and `.github/workflows/ci-semantic-pull-request.yml`. +AI-tooling layer over [`CONTRIBUTING.md` → Pull requests](../../../CONTRIBUTING.md#pull-requests) and +[Personal CI](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci); the title rules are +enforced by `.github/workflows/ci-semantic-pull-request.yml` and the body by +`.github/PULL_REQUEST_TEMPLATE.md`. Load those for detail. ## Guardrails -- **Each state-changing action needs its own explicit user confirmation.** Pushing, opening a PR, and - posting comments are not authorised just because the task was started. Confirm before each. -- **Before pushing, ensure lint passes:** `./gradlew rat spotlessCheck checkstyleMain checkstyleTest` - (`spotlessApply` to auto-fix). Scope test runs with `--tests` (see [`pulsar-tests`](../pulsar-tests/SKILL.md)). -- **Semantic PR title** `[][] `: valid `type`/`scope` values are - enforced by `.github/workflows/ci-semantic-pull-request.yml` — read it rather than guessing. - `` is imperative, like a commit subject. -- **PR description must cover Motivation (why?) and Modifications (what/how?)** per the template — a - title alone, or a description restating the title, is not enough. Link the issue when applicable: - `Fixes #N` (or equivalently `Closes #N`) for an issue the PR resolves, `Main Issue: #N` for one task - of a larger issue, and `PIP: #N` for a proposal. -- **Never rebase once the PR is open in `apache/pulsar`** — it invalidates review comments and - incremental diffs. Bring in upstream changes by merging: - - ```bash - git fetch # e.g. upstream / apache - git merge /master - ``` - - Rebasing is fine *before* the PR is opened (e.g. during the Personal CI loop). -- **Personal CI loop (pre-PR):** keep local `master` current, rebase the feature branch, push to the - **fork** to trigger CI, fix failures, and open the PR to `apache/pulsar` only when the fork's CI is - green. A PR can be open in both the fork and `apache/pulsar` at once. After the upstream PR is open, - switch from rebase to merge (above). -- **Flaky CI:** Pulsar has many flaky tests, so GitHub Actions jobs on a PR can fail unrelated to the - change. When a failure looks like flakiness, comment **`/pulsarbot rerun`** to re-run the failed - workflows — only **after all jobs from the previous run have completed** (it has no effect while - jobs are still running or queued). Note that for fork PRs a maintainer must approve each run (again - after new pushes), which is why Personal CI is the better loop for iterating on genuine failures. - Don't dismiss a failure as "flaky" without checking it isn't caused by the change. -- **Keep commits and PRs focused** — a fix, a refactor, and a formatting pass are separate - commits/PRs. Don't bundle unrelated drive-by refactoring into a feature/bug-fix PR, and **don't - reformat unrelated files or lines** (whitespace, import reordering, re-wrapping) that your change - doesn't touch — it hides the real diff and pollutes `git blame`. Raise large refactorings on - `dev@pulsar.apache.org` first rather than opening (or auto-generating) sweeping refactor PRs. -- **Security:** never describe the security nature of a change in a commit/PR — see +- **Confirm each state-changing action with the user** — pushing, opening a PR, posting a comment. + Starting the task is not standing authorization. +- **Before pushing**, run lint: `./gradlew rat spotlessCheck checkstyleMain checkstyleTest` + (`spotlessApply` to fix); scope tests with `--tests`. +- **Semantic title** `[][] ` — valid prefixes are CI-enforced + (read `ci-semantic-pull-request.yml`, don't guess); no issue numbers in the title. +- **Body must cover Motivation (why) + Modifications (what/how)**, not just restate the title. Link + the issue: `Fixes #N` / `Closes #N`, `Main Issue: #N`, or `PIP: #N`. +- **Never rebase once the PR is open in `apache/pulsar`** — merge `/master` instead. (Rebasing + is fine pre-PR, e.g. during the Personal CI loop.) +- **Iterate via Personal CI** (full pipeline in your own fork) instead of waiting on maintainer CI + approval; open the upstream PR once the fork's CI is green. +- **Flaky CI**: comment `/pulsarbot rerun` (or `rerun-failure-checks`) only after all jobs finish — but + verify the failure isn't caused by your change first. +- **Keep PRs focused** — no unrelated drive-by refactoring or reformatting; raise large refactorings on + `dev@pulsar.apache.org` first. +- **Security**: never state the security nature of a change in a commit/PR — see [`pulsar-security`](../pulsar-security/SKILL.md). -## Validation checklist +## Before you finish -- [ ] Lint (`rat spotlessCheck checkstyleMain checkstyleTest`) passes locally. -- [ ] Title matches the semantic `[type][scope]` rules in `ci-semantic-pull-request.yml`. -- [ ] Description has Motivation + Modifications and links any issue/PIP. -- [ ] No rebase after the upstream PR was opened; upstream changes pulled in via merge. -- [ ] Each push / PR-open / comment was explicitly confirmed by the user. +- [ ] Lint passes; title matches the CI semantic rules. +- [ ] Body has Motivation + Modifications and links the issue/PIP. +- [ ] No rebase after the upstream PR opened. +- [ ] Each push / PR-open / comment was user-confirmed. diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index c03b9d659fd64..66167012221ac 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -1,78 +1,28 @@ --- name: pulsar-security -description: Guardrails for handling security topics in Apache Pulsar — Pulsar's (informal) security model and threat scope, reporting a vulnerability through the private ASF channels (never a public issue, PR, or commit), and checking whether Pulsar is exposed to an already-public CVE. Use when a task involves a suspected vulnerability, a CVE, security advisories, deciding whether some behaviour (e.g. functions running code, or DoS) is in scope, or questions about supported/patched versions. -license: Apache-2.0 -compatibility: claude, codex, copilot, cursor, gemini, aider -metadata: - audience: contributors to apache/pulsar - scope: ai-tooling-security-guardrails +description: AI guardrails for security topics — Pulsar's threat scope, private vulnerability reporting, and checking exposure to an already-public CVE. Use for a suspected vulnerability, a CVE, or deciding whether some behaviour is in scope. --- # Pulsar security -Guardrails for security-related work. Canonical policy: [`SECURITY.md`](../../../SECURITY.md), -, and . - -## When to use this skill - -Use it when a task touches a suspected or reported vulnerability, a CVE, a security advisory, -deciding whether some behaviour is in scope, or questions about which Pulsar versions are -supported/patched. - -## Security model (scope) - -Pulsar's security model is not formally defined. Two assumptions decide whether a report is in scope -(full text in [`SECURITY.md` → Security model and scope](../../../SECURITY.md#security-model-and-scope)): - -- **Functions and connectors run fully trusted code.** The function instance runtime's purpose is to - execute user-provided code — **RCE is the feature, not a bug** (the PMC repeatedly receives reports - to the contrary). The thread/process runtimes can touch any files/state they can access; the - Kubernetes runtime alone does not restrict cluster resource access; hardening hooks exist but the - hardening itself is not shipped. -- **Clusters assume network-perimeter security.** Only trusted users should reach the cluster. There - is **no explicit protection against malicious DoS**; rate limits address only *unintentional* DoS - (misconfiguration, thundering herd). - -When triaging, do **not** classify "a trusted function executes/modifies its environment" or "a -perimeter-trusted client can cause a denial-of-service" as a vulnerability by default — these are -outside the model. Surface uncertainty to a human (or the channels above) rather than asserting a CVE. - -## Reporting a (non-public) vulnerability - -- **Never disclose a vulnerability in a public channel** — not in a GitHub issue, PR title/body, - commit message, or mailing list. Public disclosure before a fix is released puts users at risk. -- Report privately to the ASF Security Team (`security@apache.org`), optionally copying - `private@pulsar.apache.org`. Follow and - . -- **Do not push, commit publicly, or open a PR for the fix yourself.** You may share a suggested fix - patch privately by including it in the report to `private@pulsar.apache.org` — never via a public - commit or PR. - -The **project team** commits the fix, coordinated through the ASF security vulnerability handling -process -([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). -The team may commit the fix to the public repository **before** the release, using a neutral commit -message that does not state its security nature. In severe cases, the commit and release are made in a -private repository, and the fix is made public only at the time of the release. - -## Checking exposure to an already-public CVE - -For a CVE that is **already public** (e.g. in a dependency) and you want to know Pulsar's exposure or -whether it is already fixed: - -1. **Search existing PRs and issues by the CVE id** at — - also check issues and **closed** PRs/issues, since the fix may already be merged or the question - already answered. -2. If not found, ask via a **GitHub issue on apache/pulsar** or on **dev@pulsar.apache.org** (this is - the right venue for already-public CVEs — it is not a private disclosure). -3. Check the dependency version against `gradle/libs.versions.toml` / the `pulsar-dependencies` - platform to determine whether the affected version is actually shipped. -4. Supported versions are listed at ; users - should upgrade to a supported version to receive security updates. +AI-tooling layer over [`SECURITY.md`](../../../SECURITY.md) — the policy, the security model, and the +reporting / CVE-checking steps. Load it for the full detail; the points below are how an agent should +act. ## Guardrails -- Treat the public-vs-private distinction as the deciding factor for *where* to communicate: public - CVE → public issue/list; suspected new vulnerability → private ASF channels only. -- Don't fabricate CVE status — verify against merged PRs and the shipped dependency versions before - concluding "fixed" or "not affected". +- **Know the threat scope before calling something a vulnerability** + ([SECURITY.md → Security model](../../../SECURITY.md#security-model-and-scope)): Pulsar Functions and + connectors run **fully trusted code** (RCE is the feature, not a bug), and clusters assume + network-perimeter security with **no protection against malicious DoS**. Don't classify "a trusted + function runs / modifies its environment" or "a perimeter-trusted client causes a DoS" as a bug by + default — surface uncertainty instead of asserting a CVE. +- **Never disclose a suspected vulnerability in public** — not in an issue, PR, commit, or mailing + list, and never state the security nature of a fix. Report privately to `security@apache.org` + (optionally cc `private@pulsar.apache.org`). +- **Don't write or push the fix yourself.** The project team commits it through the ASF process; a + reporter may include a suggested patch privately in the report. +- **An already-public CVE is *not* a private disclosure.** First search PRs/issues (including closed) + by the CVE id and check the shipped version in `gradle/libs.versions.toml`; then ask via a GitHub + issue or `dev@pulsar.apache.org`. Don't fabricate "fixed" / "not affected" — verify against merged + PRs and shipped versions. diff --git a/.agents/skills/pulsar-tests/SKILL.md b/.agents/skills/pulsar-tests/SKILL.md index 52120599045d2..8c9f753e05add 100644 --- a/.agents/skills/pulsar-tests/SKILL.md +++ b/.agents/skills/pulsar-tests/SKILL.md @@ -1,78 +1,39 @@ --- name: pulsar-tests -description: AI-tooling guardrails for writing and running Apache Pulsar tests. Points at the testing conventions in CODING.md and the run instructions in CONTRIBUTING.md, then adds the AI-specific constraints — scope runs with --tests (never a whole module), no reflection into private state, treat ByteBuf leaks as real bugs but ThreadLeakDetectorListener reports as unreliable, prefer -PtestRetryCount=0. Use when adding, modifying, or running unit or integration tests (TestNG/Mockito/AssertJ/Awaitility). -license: Apache-2.0 -compatibility: claude, codex, copilot, cursor, gemini, aider -metadata: - audience: contributors to apache/pulsar - scope: ai-tooling-test-guardrails +description: AI guardrails for writing and running Apache Pulsar tests (TestNG/Mockito/AssertJ/Awaitility). Use when adding, modifying, running, or debugging unit or integration tests. --- # Pulsar tests -AI-tooling layer over Pulsar's test conventions -([`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions)) and how to run them -([`CONTRIBUTING.md` → Running tests](../../../CONTRIBUTING.md#running-tests)). - -## When to use this skill - -Use it whenever adding, modifying, or running unit or integration tests, or when investigating a test -failure / flake. - -## Read first - -- [`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions) — frameworks, - no-reflection rule, resource/leak cleanup. -- [`CONTRIBUTING.md` → Running tests](../../../CONTRIBUTING.md#running-tests) — `--tests` scoping, - test groups, integration tests, Personal CI. +AI-tooling layer over [`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions) (how +tests are written) and [`CONTRIBUTING.md` → Running tests](../../../CONTRIBUTING.md#running-tests) (how +to run them). Load those for the full detail and examples — the points below are the ones agents most +often get wrong. ## Guardrails -- **Always scope runs with `--tests`.** Running a module's whole `test` task is slow; pass a class, - method, or package pattern. Mind the module-name gotcha (`pulsar-client/` → `:pulsar-client-original`). -- **Prefer `-PtestRetryCount=0` locally** so failures and flakiness surface immediately instead of - being masked by the default single retry. -- **A test in the `flaky`/`quarantine` group is excluded by default** — add `-PexcludedTestGroups=''` - to run it. -- **Use AssertJ (with descriptions) + Awaitility** for async conditions; never `Thread.sleep`-based - timing. Add timeouts to prevent hangs. Tests must be deterministic. -- **No reflection into private state** (`WhiteboxImpl.getInternalState`/`setInternalState`, - `setAccessible(true)`). Add a package-private `@VisibleForTesting` accessor and put the test in the - same package instead. Flag any new reflection-based access. -- **Close/release what the test allocates** — executors, clients, services, and Netty `ByteBuf`s. -- **Leak signals are not equal:** - - A **`ByteBuf`/buffer leak** (pooled-allocator detection, `-Dpulsar.allocator.pooled=true`) is a - **real bug** — fix the missing `release()` in the code under test; don't suppress it. - - A **thread leak** from `ThreadLeakDetectorListener` is **currently unreliable** (high - false-positive rate, notably with `SharedPulsarBaseTest` and when `THREAD_LEAK_DETECTOR_WAIT_MILLIS` - isn't high enough — ≈`10000`, and only effective with the Gradle daemon disabled, `--no-daemon`). - Don't treat it as a real bug on its own; corroborate first. -- **Don't fabricate assertions or test names** — verify symbols exist. -- For a **bugfix**, add a regression test and confirm it fails before the fix and passes after. -- **Most Pulsar "unit tests" are integration-style** (real in-JVM broker via - `MockedPulsarServiceBaseTest` / `pulsarTestContext`); the container-based integration tests are - separate (under `tests/`). When code isn't factored for isolation, prefer an integration-style test - over mocking a web of internal collaborators; inject faults via the test infrastructure (e.g. - `pulsarTestContext.getMockBookKeeper().setReadHandleInterceptor(...)`) and assert on logs with - `TestLogAppender`. But when you write or refactor code, prefer a design that allows a true isolated - unit test (collaborators behind narrow interfaces, light mocking) — excessive mocking is a design - smell, not the goal. The regression test must fail on the unpatched code for the *actual* reason — - not because it mutates internal state. Adding a clean new test class is fine when an existing one's - style/base class gets in the way. See - [`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions). -- **Prefer `SharedPulsarBaseTest` for new integration-style tests.** It shares one - `SharedPulsarCluster` for the test-JVM lifecycle and gives each test method its own namespace. Use - `getNamespace()` for the namespace and `newTopicName()` for topic names; never hardcode namespace or - topic names (the runtime is shared across tests). -- **Concurrency / memory-visibility bugs** are timing- and platform-dependent and easily masked — a - clean run is weak evidence a fix is correct. See - [`CODING.md` → Reproducing concurrency / memory-visibility bugs](../../../CODING.md#reproducing-concurrency--memory-visibility-bugs) - (JIT warm-up rounds, interpreted-vs-compiled differences, multi-socket/NUMA hardware). - -## Validation checklist - -- [ ] The new/changed test runs green via a scoped `--tests` invocation with `-PtestRetryCount=0`. +- **Scope every run with `--tests`** (class / method / package) — never run a whole module's `test` + task. Mind the module-name gotcha (`pulsar-client/` → `:pulsar-client-original`). +- **Run with `-PtestRetryCount=0`** so flakiness surfaces; a test in `flaky` / `quarantine` needs + `-PexcludedTestGroups=''`. +- **AssertJ + Awaitility**, never `Thread.sleep` timing; tests must be deterministic and have timeouts. +- **No reflection into private state** — add a package-private `@VisibleForTesting` accessor instead; + flag any new reflection. +- **A `ByteBuf` / buffer leak is a real bug** (fix the missing `release()`), but a **thread leak from + `ThreadLeakDetectorListener` is unreliable** (false positives, notably with `SharedPulsarBaseTest`) — + corroborate before treating it as real. +- **New integration-style test → extend `SharedPulsarBaseTest`**; get the per-method namespace from + `getNamespace()` and topic names from `newTopicName()`. Never hardcode namespace/topic names (the + cluster is shared across the test JVM). +- **A bug-fix test must fail on the unpatched code for the real reason** — not by mutating internal + state to force it. Don't fabricate assertions or symbol names. Prefer a design that allows a true + isolated unit test over piling on mocks. +- **Concurrency / memory-visibility fixes**: a clean local run is weak evidence (timing- and + platform-dependent). See + [`CODING.md` → Reproducing concurrency bugs](../../../CODING.md#reproducing-concurrency--memory-visibility-bugs). + +## Before you finish + +- [ ] The scoped `--tests` run passes with `-PtestRetryCount=0`; a bug-fix test fails without the fix. - [ ] No reflection into private state; resources and buffers are released. -- [ ] For a bugfix: the test fails on the unpatched code and passes with the fix. -- [ ] For larger changes, the full suite is validated via **Personal CI**, not locally - (see [`CONTRIBUTING.md`](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci)). +- [ ] Larger changes validated via **Personal CI**, not a local full run. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index f2de90b586438..a8cce07038c08 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -206,12 +206,17 @@ git merge /master ### Branches and backports -Target `master` first. Once the change is merged, cherry-pick it to the relevant maintenance branches -(`branch-4.2`, `branch-4.1`, `branch-4.0`, …) as needed; bug and security fixes are backported per the +Target `master` first. Once the change is merged, **project maintainers handle backporting** a bug or +security fix to the supported maintenance branches (`branch-4.2`, `branch-4.0`, …) when +the bug is also present there, per the [release/support policy](https://pulsar.apache.org/contribute/release-policy/). New **features** are **not** added to LTS / maintenance branches without a dev@pulsar.apache.org discussion (and usually a PIP), to avoid regressions in stable releases. +Backporting is done by **cherry-picking commits in their original merge order**, which avoids +unnecessary merge conflicts; sometimes a dependent change must be cherry-picked before the fix itself. +AI tools are effective at resolving the merge conflicts that arise during a backport. + ## Reporting security vulnerabilities See [`SECURITY.md`](SECURITY.md) and . In short: report a From 923448dfc23c64531a9e5091b58c34c8f74b83e2 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 17:20:51 +0300 Subject: [PATCH 20/39] [improve][misc] Add licensing & provenance (human-in-the-loop) guidance to AGENTS.md --- AGENTS.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index afe7722cc6709..2f7638f39e6f4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,6 +9,26 @@ Apache Pulsar is a distributed pub-sub messaging and streaming platform. The cod performance-critical, heavily asynchronous, and concurrency-sensitive. Prioritize **correctness, thread safety, performance, maintainability, and backward compatibility**. +## Licensing and provenance (read first) + +Apache Pulsar is licensed under the Apache License 2.0, and all contributions must meet the ASF's +[Generative Tooling guidance](https://www.apache.org/legal/generative-tooling.html): + +- **A human is in the loop and is accountable.** Every pull request and every security report must be + submitted by a human contributor who has reviewed and verified the change and takes responsibility + for it. Fully autonomous agents (e.g. OpenClaw-style bots) opening PRs or filing reports on their own + are not acceptable — the AI assists, the human is accountable. +- **Don't introduce code of incompatible or unknown provenance.** Never copy verbatim from + GPL/AGPL/LGPL, proprietary, or unlicensed sources (including Stack Overflow / blog / forum snippets of + unclear licensing), and don't use a tool whose terms restrict the output inconsistently with open + source. Reimplement from specifications or Apache-compatible sources, and follow the + [ASF 3rd Party Licensing Policy](https://www.apache.org/legal/resolved.html). +- **Every new source file needs the ASF license header** (Spotless enforces this — copy the form from + an existing `.java` file). +- **Consider attributing AI assistance.** When AI tooling assisted on a change, consider adding an + `Assisted-by: ` commit trailer (the default, since a human still does the final + review); `Generated-by: ` is for minimally-modified generated output. + ## Canonical docs (read the one that fits the task) | Doc | Use for | From 8ab032a8719b467fe7b1816dd84ebae51d61686e Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 17:23:33 +0300 Subject: [PATCH 21/39] [improve][misc] Require human verification for security reports in SECURITY.md --- .agents/skills/pulsar-security/SKILL.md | 3 ++- SECURITY.md | 8 ++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index 66167012221ac..4064652942925 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -16,7 +16,8 @@ act. connectors run **fully trusted code** (RCE is the feature, not a bug), and clusters assume network-perimeter security with **no protection against malicious DoS**. Don't classify "a trusted function runs / modifies its environment" or "a perimeter-trusted client causes a DoS" as a bug by - default — surface uncertainty instead of asserting a CVE. + default — surface uncertainty instead of asserting a CVE. **Never file a security report or open a + security issue autonomously**; a human must independently verify it and own it. - **Never disclose a suspected vulnerability in public** — not in an issue, PR, commit, or mailing list, and never state the security nature of a fix. Report privately to `security@apache.org` (optionally cc `private@pulsar.apache.org`). diff --git a/SECURITY.md b/SECURITY.md index b44f7193adb05..59e35dc501633 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -42,6 +42,14 @@ Report it privately by email to the Apache Security Team at **security@apache.or security address). You may also, or instead, write to the Apache Pulsar PMC's private list, **private@pulsar.apache.org**. +**A human must verify and take responsibility for every security report.** Deciding that some +behaviour is actually a vulnerability requires judgement against the threat model above, and the +security team is staffed by volunteers — a stream of unverified or AI-hallucinated "vulnerabilities" +wastes their time and buries real issues. AI tooling may help *analyse* a suspected problem, but a +human contributor must independently verify it and own the report. **Autonomous agents must not file +security reports or open security issues on their own**, and a tool's confident-sounding output is not, +by itself, evidence of a vulnerability. + See and for more detail. From 3b283e59dd711509e3c8d4d185647b8d93550e46 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 17:33:30 +0300 Subject: [PATCH 22/39] [improve][misc] Dedupe README into the split docs; point security links to the canonical policy Slim README's Build section to a short quick-start that refers to CONTRIBUTING/ARCHITECTURE/CODING/AGENTS for detail instead of repeating it, and make the security section consistent with SECURITY.md. Point README's security links to https://github.com/apache/pulsar/security/policy, and note in SECURITY.md, AGENTS.md, CONTRIBUTING.md, and the pulsar-security skill that the latest SECURITY.md is maintained there (so forks reference the canonical copy). --- .agents/skills/pulsar-security/SKILL.md | 3 +- AGENTS.md | 4 +- CONTRIBUTING.md | 3 +- README.md | 59 +++++-------------------- SECURITY.md | 6 ++- 5 files changed, 21 insertions(+), 54 deletions(-) diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md index 4064652942925..68ea62e4e0c08 100644 --- a/.agents/skills/pulsar-security/SKILL.md +++ b/.agents/skills/pulsar-security/SKILL.md @@ -6,7 +6,8 @@ description: AI guardrails for security topics — Pulsar's threat scope, privat # Pulsar security AI-tooling layer over [`SECURITY.md`](../../../SECURITY.md) — the policy, the security model, and the -reporting / CVE-checking steps. Load it for the full detail; the points below are how an agent should +reporting / CVE-checking steps (latest at ; prefer it +over a possibly-stale fork copy). Load it for the full detail; the points below are how an agent should act. ## Guardrails diff --git a/AGENTS.md b/AGENTS.md index 2f7638f39e6f4..013abe55c3f33 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -39,7 +39,9 @@ Apache Pulsar is licensed under the Apache License 2.0, and all contributions mu | [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | The authoritative project documentation is at ; the files above and the -website remain the source of truth — this guide just layers AI-specific pointers on top. +website remain the source of truth — this guide just layers AI-specific pointers on top. For +[`SECURITY.md`](SECURITY.md) specifically, always check the latest at + rather than a possibly-stale fork copy. ## Skills diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a8cce07038c08..28c8d8ceaab10 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -219,7 +219,8 @@ AI tools are effective at resolving the merge conflicts that arise during a back ## Reporting security vulnerabilities -See [`SECURITY.md`](SECURITY.md) and . In short: report a +See [`SECURITY.md`](SECURITY.md) (latest at ) and +. In short: report a suspected vulnerability **privately** (never in a public issue, PR, or commit), and never reveal the security nature of a change in public until it is announced. An **already-public** CVE that you only want to check Pulsar's exposure to is *not* a private disclosure — search the CVE id in apache/pulsar diff --git a/README.md b/README.md index 11ea7d870f3fb..dbacb3246ceb0 100644 --- a/README.md +++ b/README.md @@ -188,53 +188,16 @@ There is also a guide for [setting up the tooling for building Pulsar](https://p ### Build -Compile and assemble: - -```bash -./gradlew assemble -``` - -Check source code license headers and formatting: - ```bash -./gradlew rat spotlessCheck checkstyleMain checkstyleTest +./gradlew assemble # compile and assemble +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" # run a single test +bin/pulsar standalone # run a standalone service ``` -Check that bundled dependencies are properly recorded in the binary distribution `LICENSE` and `NOTICE` files. Run this after adding, removing, or upgrading a runtime dependency to confirm the corresponding entry has been added to (or removed from) the LICENSE file. The task builds the binary distribution tarballs as needed: - -```bash -./gradlew checkBinaryLicense -``` - -Compile and assemble individual module: - -```bash -./gradlew :pulsar-broker:assemble -``` - -Run Unit Tests: - -```bash -./gradlew test -``` - -Run Individual Unit Test: - -```bash -./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" -``` - -Run Selected Test packages: - -```bash -./gradlew :pulsar-broker:test --tests "org.apache.pulsar.*" -``` - -Start standalone Pulsar service: - -```bash -$ bin/pulsar standalone -``` +For the full build, lint, and test workflow — test groups, integration tests, Personal CI, and PR +conventions — see [`CONTRIBUTING.md`](CONTRIBUTING.md). For the module map and Gradle build system see +[`ARCHITECTURE.md`](ARCHITECTURE.md), and for coding conventions see [`CODING.md`](CODING.md). AI coding +agents should start from [`AGENTS.md`](AGENTS.md). Check https://pulsar.apache.org for documentation and examples. @@ -301,17 +264,15 @@ You can self-register at https://communityinviter.com/apps/apache-pulsar/apache- ## Security Policy -If you find a security issue with Pulsar then please [read the security policy](https://pulsar.apache.org/security/#security-policy). It is critical to avoid public disclosure. +See the [Pulsar security policy](https://github.com/apache/pulsar/security/policy) for the full policy. In short: **never disclose a suspected vulnerability publicly** (issue, PR, or discussion) before a fix is released, and a human must verify and take responsibility for a report — autonomous agents must not file security reports on their own. ### Reporting a security vulnerability -To report a vulnerability for Pulsar, contact the [Apache Security Team](https://www.apache.org/security/). When reporting a vulnerability to [security@apache.org](mailto:security@apache.org), you can copy your email to [private@pulsar.apache.org](mailto:private@pulsar.apache.org) to send your report to the Apache Pulsar Project Management Committee. This is a private mailing list. - -https://github.com/apache/pulsar/security/policy contains more details. +Report privately by email to the [Apache Security Team](https://www.apache.org/security/) at [security@apache.org](mailto:security@apache.org); you may also copy the Apache Pulsar PMC's private list, [private@pulsar.apache.org](mailto:private@pulsar.apache.org). See the [Pulsar security policy](https://github.com/apache/pulsar/security/policy) for more detail. ### Checking exposure to an already-public CVE -For already-public CVEs where you want to check Pulsar's exposure, the right venue is a GitHub issue on apache/pulsar or a question on the dev@pulsar.apache.org mailing list. Before asking about an already-public CVE and whether it's already fixed in Pulsar, search PRs and issues with the CVE id at https://github.com/apache/pulsar/pulls (also check issues and closed PRs/issues). Pulsar project's supported versions are available at https://pulsar.apache.org/contribute/release-policy/. Users should upgrade to supported versions to receive security updates. +An already-public CVE is **not** a private disclosure. Search PRs and issues by the CVE id at (including closed ones), then ask via a GitHub issue or the dev@pulsar.apache.org mailing list if needed. Pulsar's supported versions are listed at ; upgrade to a supported version to receive security updates. See the [Pulsar security policy](https://github.com/apache/pulsar/security/policy). ## License diff --git a/SECURITY.md b/SECURITY.md index 59e35dc501633..88201a2b81b4d 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1,8 +1,10 @@ # Security Policy The authoritative policy is at ; the Apache Software Foundation's -general process is at . The summary below is what contributors (and -any AI tooling acting on their behalf) must follow. +general process is at . The latest version of this file is maintained +at — in a fork, check there rather than a +possibly-stale local copy. The summary below is what contributors (and any AI tooling acting on their +behalf) must follow. ## Security model and scope From 012afc3efd9954741445a6be4e141ba6cf426f1e Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 17:44:42 +0300 Subject: [PATCH 23/39] [improve][misc] Note pulsar-site as the source of the project documentation in AGENTS.md --- AGENTS.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 013abe55c3f33..a459310c94766 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -38,8 +38,10 @@ Apache Pulsar is licensed under the Apache License 2.0, and all contributions mu | [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging ([slog](https://github.com/merlimat/slog)), dependencies, backward compatibility, testing, and the review checklist. | | [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | -The authoritative project documentation is at ; the files above and the -website remain the source of truth — this guide just layers AI-specific pointers on top. For +The authoritative project documentation is at , whose source lives in the +[`apache/pulsar-site`](https://github.com/apache/pulsar-site) repository (where documentation changes +are contributed). The files above and the website remain the source of truth — this guide just layers +AI-specific pointers on top. For [`SECURITY.md`](SECURITY.md) specifically, always check the latest at rather than a possibly-stale fork copy. From 205ea6af5a130c000ba1bfce112663688a5f8425 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 18:03:50 +0300 Subject: [PATCH 24/39] [improve][misc] Refine SECURITY.md disclosure hygiene and drop deprecated pulsarbot command - Reorder "Disclosure hygiene" so the project-team-commits-the-fix paragraph comes first; clarify the commit-message/PR neutrality rules are for whoever commits the fix (the project team), and gate them on the vulnerability being announced. - Note that already-public dependency CVEs are an exception: name the CVE id directly in the PR title/description. - Document only `/pulsarbot rerun` (rerun-failure-checks is deprecated): it re-runs the failed jobs of a completed workflow run. --- .agents/skills/pulsar-pr-workflow/SKILL.md | 4 +-- CONTRIBUTING.md | 7 ++-- SECURITY.md | 39 +++++++++++++--------- 3 files changed, 28 insertions(+), 22 deletions(-) diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index 02b1f220a803e..2f5447bbd43a5 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -24,8 +24,8 @@ enforced by `.github/workflows/ci-semantic-pull-request.yml` and the body by is fine pre-PR, e.g. during the Personal CI loop.) - **Iterate via Personal CI** (full pipeline in your own fork) instead of waiting on maintainer CI approval; open the upstream PR once the fork's CI is green. -- **Flaky CI**: comment `/pulsarbot rerun` (or `rerun-failure-checks`) only after all jobs finish — but - verify the failure isn't caused by your change first. +- **Flaky CI**: comment `/pulsarbot rerun` to re-run the failed jobs, only after the workflow run has + completed — but verify the failure isn't caused by your change first. - **Keep PRs focused** — no unrelated drive-by refactoring or reformatting; raise large refactorings on `dev@pulsar.apache.org` first. - **Security**: never state the security nature of a change in a commit/PR — see diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 28c8d8ceaab10..82ec901c505b1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -148,10 +148,9 @@ no longer applies — bring in upstream changes by merging instead (see [Pull re Pulsar has a large number of flaky tests, so GitHub Actions jobs on a PR sometimes fail for reasons unrelated to the change. When a failure appears to be flakiness rather than a regression caused by the -PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed workflows (or -**`/pulsarbot rerun-failure-checks`** to re-run only the failed jobs). It only takes effect once -**all** jobs from the previous run have completed, so wait for the run to finish (including any -still-running or queued jobs) before commenting. +PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed jobs of the workflow run. It only +takes effect once the workflow run has **completed** (all jobs finished), so wait for the run to +finish before commenting. For a PR from a fork, a project maintainer must **approve** the workflow runs before they execute, and approval is required again whenever new changes are pushed. This adds latency to each run and makes it diff --git a/SECURITY.md b/SECURITY.md index 88201a2b81b4d..bc438deafa869 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -57,25 +57,32 @@ See and ## Disclosure hygiene for contributors -Until a fix has been publicly announced, do **not** reveal the security nature of a change anywhere -public — commit messages, pull request titles or bodies, GitHub issues, or review comments — even -when the fix touches security-adjacent code (authentication/authorization, deserialization, TLS, -networking). Describe the behaviour change neutrally. A public commit or PR that advertises "fixes the -CVE", "security fix", or "patches the vulnerability" discloses the issue before it is announced and -defeats the coordinated-disclosure process above. - -Do **not** push, commit publicly, or open a PR for a fix to a non-public security issue. When -reporting, you may include a suggested fix patch privately in your report to -`private@pulsar.apache.org` — never in a public commit or PR. - -This applies to every contributor, and identically to any AI tooling acting on a contributor's behalf. - -The **project team** commits the fix, coordinated through the ASF security vulnerability handling +The **project team commits the fix**, coordinated through the ASF security vulnerability handling process ([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). The team may commit the fix to the public repository **before** the release, using a neutral commit -message that does not state its security nature. In severe cases, the commit and release are made in a -private repository, and the fix is made public only at the time of the release. +message that does not state its security nature; in severe cases the commit and release are made in a +private repository and the fix is made public only at release time. + +As a contributor, do **not** push, commit publicly, or open a PR for a fix to a non-public security +issue yourself. When reporting, you may include a suggested fix patch privately in your report to +`private@pulsar.apache.org` — never in a public commit or PR. + +The neutrality rules below are for **whoever commits the fix — i.e. the project team**. Until the +vulnerability has been publicly announced, the commit message and PR title/body must **not** reveal its +security nature, even when the fix touches security-adjacent code (authentication/authorization, +deserialization, TLS, networking). Describe the behaviour change neutrally; a commit or PR that +advertises "fixes the CVE", "security fix", or "patches the vulnerability" discloses the issue before +it is announced and defeats coordinated disclosure. + +The same discretion applies to **everyone** — and identically to any AI tooling acting on a +contributor's behalf — in public GitHub issues, discussions, and review comments until the +announcement. + +**Already-public CVEs in dependencies are an exception.** The rules above concern *non-public* Pulsar +vulnerabilities. A PR or commit that upgrades a dependency to address an **already publicly disclosed** +CVE in that dependency does **not** follow them — the CVE is already public. Name its id directly in +the PR title and/or description (use the description when there are several CVE ids). ## Checking exposure to an already-public CVE From 9f58c0431a6809a3b38434bd4517293fe083d990 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 18:15:40 +0300 Subject: [PATCH 25/39] [improve][misc] ARCHITECTURE.md: Oxia/ZooKeeper metadata store, link architecture overview and DeepWiki --- ARCHITECTURE.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 46d3c973539bd..2c88e2065cc10 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -2,14 +2,18 @@ Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, -networking). The authoritative documentation lives at ; this file is a -map of the repository for contributors (and AI coding agents) who need to find their way around the -modules quickly. +networking). The authoritative documentation lives at — see the +[Architecture Overview](https://pulsar.apache.org/docs/4.2.x/concepts-architecture-overview/) for the +conceptual model. For a deeper, generated architecture description see +[DeepWiki](https://deepwiki.com/apache/pulsar); coding agents can install the +[DeepWiki MCP](https://docs.devin.ai/work-with-devin/deepwiki-mcp) for richer coverage of Pulsar's +architecture. This file is a map of the repository for contributors (and AI coding agents) who need to +find their way around the modules quickly. ## Big picture Pulsar separates a **stateless serving layer** (brokers) from **durable storage** (Apache -BookKeeper) and a **metadata store** (ZooKeeper / etcd / others). The Gradle modules layer +BookKeeper) and a **metadata store** (Oxia / ZooKeeper). The Gradle modules layer accordingly: - **`pulsar-client-api`, `pulsar-client-admin-api`** — public, backward-compatible interfaces only. @@ -20,7 +24,7 @@ accordingly: - **`pulsar-common`** — wire protocol and shared types. Protobuf / lightproto messages are **generated** into `generated-lightproto/` / `generated-sources/` (excluded from checkstyle and spotless). -- **`pulsar-metadata`** — pluggable metadata store abstraction (ZooKeeper, etcd, RocksDB, memory) +- **`pulsar-metadata`** — pluggable metadata store abstraction (Oxia / ZooKeeper, plus RocksDB and memory) used by broker and bookkeeper. - **`managed-ledger`** — the storage abstraction over **Apache BookKeeper**: append-only ledgers + cursors that track consumer/subscription positions. This is the durability layer the broker reads From 663e12f8e69cdc7ee55eb563f70770782ccc4ff7 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 18:28:04 +0300 Subject: [PATCH 26/39] [improve][misc] Move SECURITY.md security model section near the end and fix cross-references --- SECURITY.md | 71 +++++++++++++++++++++++++++-------------------------- 1 file changed, 36 insertions(+), 35 deletions(-) diff --git a/SECURITY.md b/SECURITY.md index bc438deafa869..f64aadaf7c283 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -6,35 +6,6 @@ at — in a fork, check there possibly-stale local copy. The summary below is what contributors (and any AI tooling acting on their behalf) must follow. -## Security model and scope - -Pulsar's security model is not formally/explicitly defined. Two long-standing design assumptions -matter when deciding whether something is actually a vulnerability: - -**Pulsar Functions and connectors execute fully trusted code.** The function instance runtime exists -to run user-provided code — *remote code execution is its core purpose, not a flaw.* (The Pulsar PMC -has repeatedly received reports claiming that "the function instance runtime allows running -user-provided code and results in an RCE"; this is expected, by-design behaviour, not a security -issue.) The available execution models also let the code modify its environment: - -- The **thread** and **process** runtimes can read or modify any files and state accessible to the - process they run in. -- The **Kubernetes** runtime, on its own, does not restrict access to resources in the Kubernetes - cluster. The project provides hooks for custom Kubernetes-runtime *hardening*, but such hardening is - **not** part of the project. - -Therefore, Pulsar Functions and connectors must only run code that is **fully trusted**. - -**Clusters rely on network-perimeter security.** Pulsar is designed to be deployed behind a trusted -network perimeter where only trusted users can reach the cluster. The project does **not** implement -explicit controls against malicious **denial-of-service (DoS)** attacks. Rate limiting exists to mitigate -*unintentional* DoS — e.g. improper configuration or thundering-herd effects — but it is **not** a -defense against a deliberate DoS by an attacker. - -Reports that amount to "a trusted function can run code / modify its environment" or "a -perimeter-trusted client can cause a denial-of-service" generally fall **outside** Pulsar's threat -model. When in doubt, ask through the channels below rather than assuming. - ## Reporting a vulnerability Do **not** open a public GitHub issue, pull request, or discussion for a suspected vulnerability — @@ -45,12 +16,13 @@ security address). You may also, or instead, write to the Apache Pulsar PMC's pr **private@pulsar.apache.org**. **A human must verify and take responsibility for every security report.** Deciding that some -behaviour is actually a vulnerability requires judgement against the threat model above, and the -security team is staffed by volunteers — a stream of unverified or AI-hallucinated "vulnerabilities" -wastes their time and buries real issues. AI tooling may help *analyse* a suspected problem, but a -human contributor must independently verify it and own the report. **Autonomous agents must not file -security reports or open security issues on their own**, and a tool's confident-sounding output is not, -by itself, evidence of a vulnerability. +behaviour is actually a vulnerability requires judgement against Pulsar's threat model (see +*[Security model and scope](#security-model-and-scope)* below), and the security team is staffed by +volunteers — a stream of unverified or AI-hallucinated "vulnerabilities" wastes their time and buries +real issues. AI tooling may help *analyse* a suspected problem, but a human contributor must +independently verify it and own the report. **Autonomous agents must not file security reports or open +security issues on their own**, and a tool's confident-sounding output is not, by itself, evidence of a +vulnerability. See and for more detail. @@ -94,6 +66,35 @@ Before asking, search PRs and issues with the CVE id at . From 5db4d018942cafd5c512210f09e142237a2d615f Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Tue, 26 May 2026 18:36:32 +0300 Subject: [PATCH 27/39] [improve][misc] Clarify security PR rules: undisclosed-vuln fixes (incl. public forks) vs public CVE dependency bumps --- .agents/skills/pulsar-pr-workflow/SKILL.md | 9 +++++++-- SECURITY.md | 3 ++- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md index 2f5447bbd43a5..333d44e9bf4c9 100644 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ b/.agents/skills/pulsar-pr-workflow/SKILL.md @@ -28,8 +28,13 @@ enforced by `.github/workflows/ci-semantic-pull-request.yml` and the body by completed — but verify the failure isn't caused by your change first. - **Keep PRs focused** — no unrelated drive-by refactoring or reformatting; raise large refactorings on `dev@pulsar.apache.org` first. -- **Security**: never state the security nature of a change in a commit/PR — see - [`pulsar-security`](../pulsar-security/SKILL.md). +- **Security**: never state the security nature of a change in a commit/PR **while a vulnerability is + undisclosed** — those fixes are handled privately by the project team. Others must not create any + commits or PRs to address such an issue, **including in a public fork** (a public-fork commit or PR is + itself a disclosure); share a suggested patch privately instead (see `pulsar-security`). A **public + dependency update that addresses an already-disclosed CVE is the opposite**: use a `[fix][sec]` title + and name the CVE id(s) in the title or description. See [`pulsar-security`](../pulsar-security/SKILL.md) + and [`SECURITY.md`](../../../SECURITY.md). ## Before you finish diff --git a/SECURITY.md b/SECURITY.md index f64aadaf7c283..3283aa2ebad7a 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -37,7 +37,8 @@ message that does not state its security nature; in severe cases the commit and private repository and the fix is made public only at release time. As a contributor, do **not** push, commit publicly, or open a PR for a fix to a non-public security -issue yourself. When reporting, you may include a suggested fix patch privately in your report to +issue yourself — **including in a public fork**, since a public-fork commit or PR is itself a +disclosure. When reporting, you may include a suggested fix patch privately in your report to `private@pulsar.apache.org` — never in a public commit or PR. The neutrality rules below are for **whoever commits the fix — i.e. the project team**. Until the From a889b5470e3f510426f32f3baad79a0e29481c15 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 15:43:48 +0300 Subject: [PATCH 28/39] [improve][misc] Add BUILDING.md and per-path Copilot instructions; consolidate agent guidance - Add BUILDING.md: prerequisites, build/lint commands, the Gradle build infrastructure, the module-name-vs-directory gotcha, and build-change guardrails (moved out of ARCHITECTURE.md and CONTRIBUTING.md). - Add .github/instructions/ path-specific GitHub Copilot instruction files with applyTo globs: gradle-build -> BUILDING.md, java-coding -> CODING.md. - AGENTS.md: add BUILDING.md to the canonical-docs table and an Agent guardrails section; consolidate the agent guidance into AGENTS.md and the canonical docs. - Update cross-references in CODING.md, CONTRIBUTING.md, README.md; drop the removed Pulsar SQL entry from README.md. --- .agents/skills/README.md | 19 ---- .agents/skills/pulsar-build/SKILL.md | 39 -------- .agents/skills/pulsar-pr-workflow/SKILL.md | 44 --------- .agents/skills/pulsar-security/SKILL.md | 30 ------ .agents/skills/pulsar-tests/SKILL.md | 39 -------- .../instructions/gradle-build.instructions.md | 10 ++ .../instructions/java-coding.instructions.md | 10 ++ AGENTS.md | 36 ++++--- ARCHITECTURE.md | 47 ++-------- BUILDING.md | 93 +++++++++++++++++++ CODING.md | 3 +- CONTRIBUTING.md | 44 +++------ README.md | 9 +- 13 files changed, 156 insertions(+), 267 deletions(-) delete mode 100644 .agents/skills/README.md delete mode 100644 .agents/skills/pulsar-build/SKILL.md delete mode 100644 .agents/skills/pulsar-pr-workflow/SKILL.md delete mode 100644 .agents/skills/pulsar-security/SKILL.md delete mode 100644 .agents/skills/pulsar-tests/SKILL.md create mode 100644 .github/instructions/gradle-build.instructions.md create mode 100644 .github/instructions/java-coding.instructions.md create mode 100644 BUILDING.md diff --git a/.agents/skills/README.md b/.agents/skills/README.md deleted file mode 100644 index 2b1daca25e968..0000000000000 --- a/.agents/skills/README.md +++ /dev/null @@ -1,19 +0,0 @@ -# Agent skills - -Task-specific guidance for AI coding assistants working in apache/pulsar. Each skill is a directory -with a `SKILL.md` that adds AI-tooling guardrails on top of the canonical docs -([`CONTRIBUTING.md`](../../CONTRIBUTING.md), [`ARCHITECTURE.md`](../../ARCHITECTURE.md), -[`CODING.md`](../../CODING.md)). Load the relevant skill **on demand** — before working in its area — -instead of pulling everything into context. The index of skills is in the repository-root -[`AGENTS.md`](../../AGENTS.md#skills). - -| Skill | Use for | -|-------|---------| -| [`pulsar-build`](pulsar-build/SKILL.md) | Gradle build changes — convention plugins, `settings.gradle.kts`, version catalog, dependencies. | -| [`pulsar-tests`](pulsar-tests/SKILL.md) | Writing and running tests — `--tests` scoping, TestNG groups, leak detectors. | -| [`pulsar-pr-workflow`](pulsar-pr-workflow/SKILL.md) | Branching, semantic PR titles, descriptions, Personal CI, rebase-vs-merge. | -| [`pulsar-security`](pulsar-security/SKILL.md) | Reporting a vulnerability or checking exposure to a public CVE. | - -To add a skill, create `/SKILL.md` with YAML frontmatter (`name`, `description` — phrased -so an agent knows *when* to use it), keep it focused on one task area, cite the canonical docs rather -than restating them, and add a row here and in [`AGENTS.md`](../../AGENTS.md#skills). diff --git a/.agents/skills/pulsar-build/SKILL.md b/.agents/skills/pulsar-build/SKILL.md deleted file mode 100644 index 01bb6cb130007..0000000000000 --- a/.agents/skills/pulsar-build/SKILL.md +++ /dev/null @@ -1,39 +0,0 @@ ---- -name: pulsar-build -description: AI guardrails for Gradle build changes (convention plugins, settings, version catalog, dependencies). Use when editing build-logic/, *.gradle.kts, gradle.properties, or gradle/libs.versions.toml. ---- - -# Pulsar build - -AI-tooling layer over [`ARCHITECTURE.md` → Build infrastructure](../../../ARCHITECTURE.md#build-infrastructure) -(tiers, convention plugins, version catalog, enforced platform, the module-name gotcha) and -[`CODING.md` → Dependencies](../../../CODING.md#dependencies). Load those for detail. - -## When to use - -Editing `build-logic/`, `settings.gradle.kts`, root/module `build.gradle.kts`, `gradle.properties`, -`gradle/libs.versions.toml`, the `pulsar-dependencies` platform, or the Gradle wrapper. (Source -changes belong to [`pulsar-tests`](../pulsar-tests/SKILL.md) or the module itself.) - -## Guardrails - -- **Edit shared config in `build-logic/conventions/`**, not per-module. -- **Versions come from `gradle/libs.versions.toml`** (`libs.*` / `pulsar-dependencies`) — never - hardcode a version in a build script. -- **Keep tasks configuration-cache and configure-on-demand compatible** (both are enabled): no - mutable-state reads or `Project` access in task actions — use `Provider` / value sources. Verify - with `--configuration-cache`. One-off tooling tasks may be exempt. Follow - [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html). -- **Use the real Gradle project path** (`pulsar-client/` is `:pulsar-client-original`). -- **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution - LICENSE/NOTICE; justify any genuinely new dependency. -- **Published modules must not depend on internal modules** (compile/runtime) — the artifact would be - unresolvable from Maven Central. Modules aren't published unless they apply - `pulsar.public-java-library-conventions`. -- **Don't fabricate plugin / DSL / task names** — verify they exist. - -## Before you finish - -- [ ] The affected task and `./gradlew help` run clean with `--configuration-cache`. -- [ ] `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` pass. -- [ ] Dependency change → `checkBinaryLicense` passes and LICENSE/NOTICE are updated. diff --git a/.agents/skills/pulsar-pr-workflow/SKILL.md b/.agents/skills/pulsar-pr-workflow/SKILL.md deleted file mode 100644 index 333d44e9bf4c9..0000000000000 --- a/.agents/skills/pulsar-pr-workflow/SKILL.md +++ /dev/null @@ -1,44 +0,0 @@ ---- -name: pulsar-pr-workflow -description: AI guardrails for branching, commits, and PRs to apache/pulsar — semantic titles, PR descriptions, Personal CI, never-rebase-after-open, per-action confirmation. Use when committing, opening/updating a PR, or running CI. ---- - -# Pulsar PR workflow - -AI-tooling layer over [`CONTRIBUTING.md` → Pull requests](../../../CONTRIBUTING.md#pull-requests) and -[Personal CI](../../../CONTRIBUTING.md#running-the-full-ci-pipeline-personal-ci); the title rules are -enforced by `.github/workflows/ci-semantic-pull-request.yml` and the body by -`.github/PULL_REQUEST_TEMPLATE.md`. Load those for detail. - -## Guardrails - -- **Confirm each state-changing action with the user** — pushing, opening a PR, posting a comment. - Starting the task is not standing authorization. -- **Before pushing**, run lint: `./gradlew rat spotlessCheck checkstyleMain checkstyleTest` - (`spotlessApply` to fix); scope tests with `--tests`. -- **Semantic title** `[][] ` — valid prefixes are CI-enforced - (read `ci-semantic-pull-request.yml`, don't guess); no issue numbers in the title. -- **Body must cover Motivation (why) + Modifications (what/how)**, not just restate the title. Link - the issue: `Fixes #N` / `Closes #N`, `Main Issue: #N`, or `PIP: #N`. -- **Never rebase once the PR is open in `apache/pulsar`** — merge `/master` instead. (Rebasing - is fine pre-PR, e.g. during the Personal CI loop.) -- **Iterate via Personal CI** (full pipeline in your own fork) instead of waiting on maintainer CI - approval; open the upstream PR once the fork's CI is green. -- **Flaky CI**: comment `/pulsarbot rerun` to re-run the failed jobs, only after the workflow run has - completed — but verify the failure isn't caused by your change first. -- **Keep PRs focused** — no unrelated drive-by refactoring or reformatting; raise large refactorings on - `dev@pulsar.apache.org` first. -- **Security**: never state the security nature of a change in a commit/PR **while a vulnerability is - undisclosed** — those fixes are handled privately by the project team. Others must not create any - commits or PRs to address such an issue, **including in a public fork** (a public-fork commit or PR is - itself a disclosure); share a suggested patch privately instead (see `pulsar-security`). A **public - dependency update that addresses an already-disclosed CVE is the opposite**: use a `[fix][sec]` title - and name the CVE id(s) in the title or description. See [`pulsar-security`](../pulsar-security/SKILL.md) - and [`SECURITY.md`](../../../SECURITY.md). - -## Before you finish - -- [ ] Lint passes; title matches the CI semantic rules. -- [ ] Body has Motivation + Modifications and links the issue/PIP. -- [ ] No rebase after the upstream PR opened. -- [ ] Each push / PR-open / comment was user-confirmed. diff --git a/.agents/skills/pulsar-security/SKILL.md b/.agents/skills/pulsar-security/SKILL.md deleted file mode 100644 index 68ea62e4e0c08..0000000000000 --- a/.agents/skills/pulsar-security/SKILL.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -name: pulsar-security -description: AI guardrails for security topics — Pulsar's threat scope, private vulnerability reporting, and checking exposure to an already-public CVE. Use for a suspected vulnerability, a CVE, or deciding whether some behaviour is in scope. ---- - -# Pulsar security - -AI-tooling layer over [`SECURITY.md`](../../../SECURITY.md) — the policy, the security model, and the -reporting / CVE-checking steps (latest at ; prefer it -over a possibly-stale fork copy). Load it for the full detail; the points below are how an agent should -act. - -## Guardrails - -- **Know the threat scope before calling something a vulnerability** - ([SECURITY.md → Security model](../../../SECURITY.md#security-model-and-scope)): Pulsar Functions and - connectors run **fully trusted code** (RCE is the feature, not a bug), and clusters assume - network-perimeter security with **no protection against malicious DoS**. Don't classify "a trusted - function runs / modifies its environment" or "a perimeter-trusted client causes a DoS" as a bug by - default — surface uncertainty instead of asserting a CVE. **Never file a security report or open a - security issue autonomously**; a human must independently verify it and own it. -- **Never disclose a suspected vulnerability in public** — not in an issue, PR, commit, or mailing - list, and never state the security nature of a fix. Report privately to `security@apache.org` - (optionally cc `private@pulsar.apache.org`). -- **Don't write or push the fix yourself.** The project team commits it through the ASF process; a - reporter may include a suggested patch privately in the report. -- **An already-public CVE is *not* a private disclosure.** First search PRs/issues (including closed) - by the CVE id and check the shipped version in `gradle/libs.versions.toml`; then ask via a GitHub - issue or `dev@pulsar.apache.org`. Don't fabricate "fixed" / "not affected" — verify against merged - PRs and shipped versions. diff --git a/.agents/skills/pulsar-tests/SKILL.md b/.agents/skills/pulsar-tests/SKILL.md deleted file mode 100644 index 8c9f753e05add..0000000000000 --- a/.agents/skills/pulsar-tests/SKILL.md +++ /dev/null @@ -1,39 +0,0 @@ ---- -name: pulsar-tests -description: AI guardrails for writing and running Apache Pulsar tests (TestNG/Mockito/AssertJ/Awaitility). Use when adding, modifying, running, or debugging unit or integration tests. ---- - -# Pulsar tests - -AI-tooling layer over [`CODING.md` → Testing conventions](../../../CODING.md#testing-conventions) (how -tests are written) and [`CONTRIBUTING.md` → Running tests](../../../CONTRIBUTING.md#running-tests) (how -to run them). Load those for the full detail and examples — the points below are the ones agents most -often get wrong. - -## Guardrails - -- **Scope every run with `--tests`** (class / method / package) — never run a whole module's `test` - task. Mind the module-name gotcha (`pulsar-client/` → `:pulsar-client-original`). -- **Run with `-PtestRetryCount=0`** so flakiness surfaces; a test in `flaky` / `quarantine` needs - `-PexcludedTestGroups=''`. -- **AssertJ + Awaitility**, never `Thread.sleep` timing; tests must be deterministic and have timeouts. -- **No reflection into private state** — add a package-private `@VisibleForTesting` accessor instead; - flag any new reflection. -- **A `ByteBuf` / buffer leak is a real bug** (fix the missing `release()`), but a **thread leak from - `ThreadLeakDetectorListener` is unreliable** (false positives, notably with `SharedPulsarBaseTest`) — - corroborate before treating it as real. -- **New integration-style test → extend `SharedPulsarBaseTest`**; get the per-method namespace from - `getNamespace()` and topic names from `newTopicName()`. Never hardcode namespace/topic names (the - cluster is shared across the test JVM). -- **A bug-fix test must fail on the unpatched code for the real reason** — not by mutating internal - state to force it. Don't fabricate assertions or symbol names. Prefer a design that allows a true - isolated unit test over piling on mocks. -- **Concurrency / memory-visibility fixes**: a clean local run is weak evidence (timing- and - platform-dependent). See - [`CODING.md` → Reproducing concurrency bugs](../../../CODING.md#reproducing-concurrency--memory-visibility-bugs). - -## Before you finish - -- [ ] The scoped `--tests` run passes with `-PtestRetryCount=0`; a bug-fix test fails without the fix. -- [ ] No reflection into private state; resources and buffers are released. -- [ ] Larger changes validated via **Personal CI**, not a local full run. diff --git a/.github/instructions/gradle-build.instructions.md b/.github/instructions/gradle-build.instructions.md new file mode 100644 index 0000000000000..1117973be46f2 --- /dev/null +++ b/.github/instructions/gradle-build.instructions.md @@ -0,0 +1,10 @@ +--- +description: Conventions for changing Apache Pulsar's Gradle build (build scripts, version catalog, convention plugins). +applyTo: "**/*.gradle.kts,**/gradle.properties,gradle/libs.versions.toml,build-logic/**" +--- + +# Gradle build changes + +These files configure Apache Pulsar's **Gradle** build. Follow the guidance in +[`BUILDING.md`](../../BUILDING.md) when editing or reviewing a +build file. \ No newline at end of file diff --git a/.github/instructions/java-coding.instructions.md b/.github/instructions/java-coding.instructions.md new file mode 100644 index 0000000000000..1021307959995 --- /dev/null +++ b/.github/instructions/java-coding.instructions.md @@ -0,0 +1,10 @@ +--- +description: Coding conventions for Apache Pulsar Java source and test code. +applyTo: "**/*.java" +--- + +# Java code changes + +Apache Pulsar is performance-critical, asynchronous, and concurrency-sensitive. The full reference is +[`CODING.md`](../../CODING.md) (with [`CONTRIBUTING.md`](../../CONTRIBUTING.md) for running tests) — +follow it when editing or reviewing Java code. \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index a459310c94766..54664874ef629 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,8 +2,8 @@ Supplemental guidance for AI coding assistants (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, and similar tools) working in this repository. Keep this file short — it is a **router**. The detail -lives in the human-facing docs and the task skills, which you should load **on demand** rather than -pulling everything into context up front. +lives in the human-facing docs linked below; read the one that fits your task rather than pulling +everything into context up front. Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is performance-critical, heavily asynchronous, and concurrency-sensitive. Prioritize **correctness, @@ -33,8 +33,9 @@ Apache Pulsar is licensed under the Apache License 2.0, and all contributions mu | Doc | Use for | |-----|---------| -| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: build, lint, running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | -| [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the Gradle build infrastructure, the module-name-vs-directory gotcha, and the `pip/` proposals. | +| [`BUILDING.md`](BUILDING.md) | Building: prerequisites (JDK), build/lint commands, the Gradle build infrastructure (convention plugins, version catalog, enforced platform), the module-name-vs-directory gotcha, and how to change the build. | +| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | +| [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the concurrency model and backpressure, and the `pip/` proposals. | | [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging ([slog](https://github.com/merlimat/slog)), dependencies, backward compatibility, testing, and the review checklist. | | [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | @@ -45,21 +46,19 @@ AI-specific pointers on top. For [`SECURITY.md`](SECURITY.md) specifically, always check the latest at rather than a possibly-stale fork copy. -## Skills +## Agent guardrails -> **Before writing or modifying code in an area below, read the matching skill file first.** Each -> skill is more focused than the canonical docs and adds AI-specific guardrails; it tells you what to -> verify and points back into the docs above. Load it on demand to keep context small. +A few rules matter specifically when an AI tool makes the change, on top of the canonical docs above: -Task-specific guidance lives under [`.agents/skills/`](.agents/skills/), each in its own directory -with a `SKILL.md`: - -| Skill | Use for | -|-------|---------| -| [`pulsar-build`](.agents/skills/pulsar-build/SKILL.md) | Editing the Gradle build — convention plugins under `build-logic/`, `settings.gradle.kts`, `gradle/libs.versions.toml`, dependency changes. Enforces configuration-cache / configure-on-demand compatibility and LICENSE/NOTICE upkeep. | -| [`pulsar-tests`](.agents/skills/pulsar-tests/SKILL.md) | Writing or running tests — TestNG groups, `--tests` scoping, AssertJ/Awaitility, no-reflection rule, buffer/thread leak detectors, retry count. | -| [`pulsar-pr-workflow`](.agents/skills/pulsar-pr-workflow/SKILL.md) | Branching, commits, semantic PR titles, PR descriptions, the Personal CI loop, and the never-rebase-after-open / merge-instead rule. | -| [`pulsar-security`](.agents/skills/pulsar-security/SKILL.md) | Reporting a vulnerability or checking Pulsar's exposure to an already-public CVE. | +- **Verify, don't fabricate.** Before you use a class, method, field, configuration key, Gradle task, + plugin, or DSL name, confirm it actually exists (search the code or the build files) — don't invent + APIs, test assertions, or symbol names. Don't assert a CVE is "fixed" / "not affected" without + checking the merged PRs and the shipped version. +- **Confirm state-changing actions.** Get the user's explicit confirmation before pushing, opening or + updating a PR, or posting a comment. Being asked to start a task is not standing authorization for + these outward-facing actions — see *Licensing and provenance* above. +- **A clean local run is weak evidence for concurrency/visibility fixes** (timing- and + platform-dependent). See [`CODING.md`](CODING.md#reproducing-concurrency--memory-visibility-bugs). ## Critical rules @@ -74,8 +73,7 @@ with a `SKILL.md`: 5. **PRs:** semantic `[type][scope]` title; describe **motivation** and **modifications**; do not rebase once the PR is open in `apache/pulsar` — merge upstream `master` instead. 6. **Security:** never disclose a vulnerability — or the security nature of a change — in a public - issue, PR, or commit. See [`SECURITY.md`](SECURITY.md) / - [`pulsar-security`](.agents/skills/pulsar-security/SKILL.md). + issue, PR, or commit. See [`SECURITY.md`](SECURITY.md). 7. **Stay in scope.** Keep a change focused on its task; don't bundle unrelated drive-by refactors or generate broad mass-refactoring PRs. Discuss large refactorings on `dev@pulsar.apache.org` first. See [`CONTRIBUTING.md`](CONTRIBUTING.md#pull-requests). diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 2c88e2065cc10..4033d6aecd4e6 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -77,45 +77,14 @@ than it can handle, particularly with respect to memory. The memory side is desc backpressure (beyond memory) is not yet documented and would benefit from being defined alongside the concurrency model. -## Build infrastructure - -The build is **Gradle** (migrated from Maven via PIP-463; much existing tooling and docs elsewhere -still reference Maven). Use the wrapper `./gradlew` — no separate Gradle install needed. **JDK 21 or -25 is required to run the build** (`-PskipJavaVersionCheck` bypasses the check). Bytecode targets -Java 17 (`options.release = 17`). - -- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, - higher tiers build on lower ones). -- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, - `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. This is - where shared compile/test/dependency config lives — edit conventions here rather than duplicating - config across modules. -- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; - `libs.*` references in build scripts). -- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every - module. - -When changing the build, **follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)**. -The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and -**configure-on-demand** (`org.gradle.configureondemand=true`), so any task used by a commonly-run -task (`assemble`, `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, -`checkBinaryLicense`, `docker*`, etc.) **must be configuration-cache and configure-on-demand -compatible** — no reading of mutable state at execution time, no `Project` access in task actions, -use `Provider` / value sources instead. Verify with `--configuration-cache`. Tooling/one-off tasks -that are not part of these common flows (e.g. `verifyTestGroups`, dependency-report and ad-hoc -maintenance tasks) may be exempt. - -### Module name vs. directory name gotcha - -Several Gradle project paths do **not** match their directory because the Maven artifactId is -preserved. Most importantly: - -- Directory `pulsar-client/` → project **`:pulsar-client-original`** -- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** -- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` - -Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. -Check `settings.gradle.kts` when a path is ambiguous. +## Build and module layout + +The Gradle build infrastructure — convention plugins under `build-logic/`, the +`gradle/libs.versions.toml` version catalog, the enforced `pulsar-dependencies` platform, the +dependency tiers in `settings.gradle.kts`, and the configuration-cache / configure-on-demand +requirements — is documented in [`BUILDING.md`](BUILDING.md), together with the +[module-name-vs-directory gotcha](BUILDING.md#module-name-vs-directory-name-gotcha) (e.g. directory +`pulsar-client/` is the Gradle project `:pulsar-client-original`). ## Pulsar Improvement Proposals (`pip/`) diff --git a/BUILDING.md b/BUILDING.md new file mode 100644 index 0000000000000..2d689ce54a7ce --- /dev/null +++ b/BUILDING.md @@ -0,0 +1,93 @@ +# Building Apache Pulsar + +Apache Pulsar uses a **Gradle** build (migrated from Maven via PIP-463; some older tooling and docs +elsewhere still reference Maven). Use the bundled wrapper `./gradlew` (Linux/macOS) or `gradlew.bat` +(Windows) — no separate Gradle install is needed. For local-environment setup see the +[build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and the +[IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). For how to *run tests* see +[`CONTRIBUTING.md` → Running tests](CONTRIBUTING.md#running-tests); for the module map see +[`ARCHITECTURE.md`](ARCHITECTURE.md). + +## Prerequisites + +- **JDK 21 or 25** is required to build `master`. Bytecode targets Java 17 (`options.release = 17`). + `zip` is also needed. (`-PskipJavaVersionCheck` bypasses the JDK version check.) +- The Gradle wrapper `./gradlew` pins the Gradle version — don't install Gradle separately. + +## Common build commands + +```bash +# Compile and assemble everything (or a single module) +./gradlew assemble +./gradlew :pulsar-broker:assemble + +# Lint / verify (license headers, formatting, checkstyle) — run before pushing +./gradlew rat spotlessCheck checkstyleMain checkstyleTest +./gradlew spotlessApply # auto-fix license headers/formatting + +# Verify bundled-dependency LICENSE/NOTICE coverage (run after changing a runtime dependency) +./gradlew checkBinaryLicense + +# Start a standalone Pulsar service (broker + bookie + metadata in one JVM) +bin/pulsar standalone + +# Build docker images apachepulsar/pulsar(-all):latest +./gradlew docker # or docker-all +``` + +## Build infrastructure + +The build is **Gradle**; the wrapper requires **JDK 21 or 25** to run (bytecode targets Java 17). + +- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, + higher tiers build on lower ones). +- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, + `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. This is + where shared compile/test/dependency config lives — edit conventions here rather than duplicating + config across modules. +- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; + referenced as `libs.*` in build scripts). +- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every + module. + +The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and +**configure-on-demand** (`org.gradle.configureondemand=true`). + +## Module name vs. directory name gotcha + +Several Gradle project paths do **not** match their directory because the Maven artifactId is +preserved. Most importantly: + +- Directory `pulsar-client/` → project **`:pulsar-client-original`** +- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** +- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` + +Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. +Check `settings.gradle.kts` when a path is ambiguous. + +## Changing the build + +When editing `build-logic/`, `settings.gradle.kts`, a module `build.gradle.kts`, `gradle.properties`, +`gradle/libs.versions.toml`, or the `pulsar-dependencies` platform: + +- **Edit shared config in `build-logic/conventions/`**, not per-module. +- **Versions come from `gradle/libs.versions.toml`** (`libs.*` / `pulsar-dependencies`) — never + hardcode a version in a build script. +- **Keep tasks configuration-cache and configure-on-demand compatible** (both are enabled): no + reading of mutable state at execution time and no `Project` access in task actions — use + `Provider` / value sources instead, and verify with `--configuration-cache`. Tasks reached by the + common flows (`assemble`, `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, + `checkBinaryLicense`, `docker*`) must be compatible; one-off tooling tasks not part of those flows + (e.g. `verifyTestGroups`, ad-hoc maintenance/report tasks) may be exempt. Follow the + [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html). +- **Published modules must not depend on internal modules** at compile/runtime scope — the artifact + would be unresolvable from Maven Central. A module is published only when it applies + `pulsar.public-java-library-conventions`. +- **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution + `LICENSE`/`NOTICE`; justify any genuinely new dependency (see + [`CODING.md` → Dependencies](CODING.md#dependencies)). +- **Follow the [Gradle best practices](https://github.com/gradle/gradle/blob/master/platforms/documentation/docs/src/docs/userguide/best-practices/best_practices_index.adoc)**. + +Before finishing a build change, confirm the affected task and `./gradlew help` run clean with +`--configuration-cache`, and that `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` +pass (plus `checkBinaryLicense` if a dependency changed). diff --git a/CODING.md b/CODING.md index acf7b5ad5cf82..35b6a07022ef7 100644 --- a/CODING.md +++ b/CODING.md @@ -4,8 +4,7 @@ Apache Pulsar follows the Sun Java Coding Conventions with additional project-sp codebase is performance-critical, asynchronous, and concurrency-sensitive, so code review prioritizes **correctness, thread safety, performance, maintainability, and backward compatibility**. This file is the canonical coding reference for both human contributors and AI -coding agents; the task skills under [`.agents/skills/`](.agents/skills/) layer AI-specific -guardrails on top of it. +coding agents; see [`AGENTS.md`](AGENTS.md) for the agent-specific guardrails that apply on top of it. ## Style diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 82ec901c505b1..ecd8f5b55ed97 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -24,38 +24,15 @@ We would love for you to contribute to Apache Pulsar and make it even better! The **authoritative** contributor guide is the [Apache Pulsar Contributing Guide](https://pulsar.apache.org/contribute/) — please read it before starting. This file is a quick, in-repo reference for the local development -workflow (build, test, PR, CI); for the big-picture module map see [`ARCHITECTURE.md`](ARCHITECTURE.md) -and for code style see [`CODING.md`](CODING.md). +workflow (test, PR, CI). For building see [`BUILDING.md`](BUILDING.md); for the big-picture module map +see [`ARCHITECTURE.md`](ARCHITECTURE.md); and for code style see [`CODING.md`](CODING.md). -## Prerequisites +## Building -- **JDK 21 or 25** is required to build (`master`). Bytecode targets Java 17. `zip` is also needed. - (`-PskipJavaVersionCheck` bypasses the JDK version check.) -- Use the bundled Gradle wrapper `./gradlew` (Linux/macOS) or `gradlew.bat` (Windows) — no separate - Gradle install is needed. See the - [build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and - [IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). - -## Build - -```bash -# Compile and assemble everything (or a single module) -./gradlew assemble -./gradlew :pulsar-broker:assemble - -# Lint / verify (license headers, formatting, checkstyle) — run before pushing -./gradlew rat spotlessCheck checkstyleMain checkstyleTest -./gradlew spotlessApply # auto-fix license headers/formatting - -# Verify bundled-dependency LICENSE/NOTICE coverage (run after changing a runtime dependency) -./gradlew checkBinaryLicense - -# Start a standalone Pulsar service (broker + bookie + metadata in one JVM) -bin/pulsar standalone - -# Build docker images apachepulsar/pulsar(-all):latest -./gradlew docker # or docker-all -``` +See [`BUILDING.md`](BUILDING.md) for prerequisites (JDK 21/25), the common build and lint commands +(`./gradlew assemble`, `rat spotlessCheck checkstyleMain checkstyleTest`, `checkBinaryLicense`, …), +the Gradle build infrastructure, and how to change the build. The rest of this guide covers the test +and PR workflow. ## Running tests @@ -74,7 +51,7 @@ The **container-based integration tests** that run against a Pulsar Docker image ./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.admin.*" ``` -> Note the [module-name-vs-directory gotcha](ARCHITECTURE.md#module-name-vs-directory-name-gotcha): +> Note the [module-name-vs-directory gotcha](BUILDING.md#module-name-vs-directory-name-gotcha): > directory `pulsar-client/` is the Gradle project `:pulsar-client-original`. ### Test groups (TestNG) @@ -228,5 +205,6 @@ PRs/issues first, then ask via a GitHub issue or dev@pulsar.apache.org. ## AI coding agents If you use an AI coding assistant (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, …), see -[`AGENTS.md`](AGENTS.md) for an index of the agent-facing guidance and the task-specific skills under -[`.agents/skills/`](.agents/skills/). +[`AGENTS.md`](AGENTS.md) for the agent-facing guidance — a routing index into [`BUILDING.md`](BUILDING.md), +this guide, [`ARCHITECTURE.md`](ARCHITECTURE.md), [`CODING.md`](CODING.md), and [`SECURITY.md`](SECURITY.md), +plus the guardrails that apply specifically to AI-made changes. diff --git a/README.md b/README.md index dbacb3246ceb0..7e5406068b6d0 100644 --- a/README.md +++ b/README.md @@ -95,7 +95,6 @@ components in the Pulsar ecosystem, including connectors, adapters, and other la - [Pulsar Connectors](https://github.com/apache/pulsar-connectors) (bundled as [pulsar-io](pulsar-io)) - [Pulsar Translation](https://github.com/apache/pulsar-translation) -- [Pulsar SQL (Pulsar Presto Connector)](https://github.com/apache/pulsar-presto) (bundled as [pulsar-sql](pulsar-sql)) - [Ruby Client](https://github.com/apache/pulsar-client-ruby) ## Pulsar Runtime Java Version Recommendation @@ -163,6 +162,9 @@ Docker image Java runtime: 17 ## Build Pulsar +The quick start below covers the essentials; see [`BUILDING.md`](BUILDING.md) for the full build +guide (prerequisites, build/lint commands, the Gradle build infrastructure, and how to change the build). + ### Requirements - JDK @@ -194,8 +196,9 @@ There is also a guide for [setting up the tooling for building Pulsar](https://p bin/pulsar standalone # run a standalone service ``` -For the full build, lint, and test workflow — test groups, integration tests, Personal CI, and PR -conventions — see [`CONTRIBUTING.md`](CONTRIBUTING.md). For the module map and Gradle build system see +For the full build, lint, and dependency workflow — including the Gradle build infrastructure — see +[`BUILDING.md`](BUILDING.md). For the test and PR workflow — test groups, integration tests, Personal +CI, and PR conventions — see [`CONTRIBUTING.md`](CONTRIBUTING.md). For the module map see [`ARCHITECTURE.md`](ARCHITECTURE.md), and for coding conventions see [`CODING.md`](CODING.md). AI coding agents should start from [`AGENTS.md`](AGENTS.md). From 2a5d28419b57ace62be642d05d7753be279bb14f Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 16:09:49 +0300 Subject: [PATCH 29/39] [improve][misc] BUILDING.md: de-duplicate the Gradle best-practices link Keep a single entry; note that AI agents should read the AsciiDoc source (plain text) while human viewers use the rendered HTML page. --- BUILDING.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/BUILDING.md b/BUILDING.md index 2d689ce54a7ce..4f003f4a8517b 100644 --- a/BUILDING.md +++ b/BUILDING.md @@ -78,15 +78,17 @@ When editing `build-logic/`, `settings.gradle.kts`, a module `build.gradle.kts`, `Provider` / value sources instead, and verify with `--configuration-cache`. Tasks reached by the common flows (`assemble`, `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, `checkBinaryLicense`, `docker*`) must be compatible; one-off tooling tasks not part of those flows - (e.g. `verifyTestGroups`, ad-hoc maintenance/report tasks) may be exempt. Follow the - [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html). + (e.g. `verifyTestGroups`, ad-hoc maintenance/report tasks) may be exempt. - **Published modules must not depend on internal modules** at compile/runtime scope — the artifact would be unresolvable from Maven Central. A module is published only when it applies `pulsar.public-java-library-conventions`. - **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution `LICENSE`/`NOTICE`; justify any genuinely new dependency (see [`CODING.md` → Dependencies](CODING.md#dependencies)). -- **Follow the [Gradle best practices](https://github.com/gradle/gradle/blob/master/platforms/documentation/docs/src/docs/userguide/best-practices/best_practices_index.adoc)**. +- **Follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)** + — AI agents should instead read the + [AsciiDoc source](https://github.com/gradle/gradle/blob/master/platforms/documentation/docs/src/docs/userguide/best-practices/best_practices_index.adoc), + which is plain text and cheaper to parse than the rendered HTML page. Before finishing a build change, confirm the affected task and `./gradlew help` run clean with `--configuration-cache`, and that `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` From bc18df222256088f2085e5fbf61139948195c632 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 16:20:20 +0300 Subject: [PATCH 30/39] [improve][misc] Refine AGENTS.md/ARCHITECTURE.md wording and section order - AGENTS.md: "concurrency/data race fixes" wording; drop the SECURITY.md check-the-latest note. - ARCHITECTURE.md: split the intro paragraph; move "Concurrency model" and "Backpressure" to the end. --- AGENTS.md | 6 ++---- ARCHITECTURE.md | 42 ++++++++++++++++++++++-------------------- 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 54664874ef629..14936f6765115 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -42,9 +42,7 @@ Apache Pulsar is licensed under the Apache License 2.0, and all contributions mu The authoritative project documentation is at , whose source lives in the [`apache/pulsar-site`](https://github.com/apache/pulsar-site) repository (where documentation changes are contributed). The files above and the website remain the source of truth — this guide just layers -AI-specific pointers on top. For -[`SECURITY.md`](SECURITY.md) specifically, always check the latest at - rather than a possibly-stale fork copy. +AI-specific pointers on top. ## Agent guardrails @@ -57,7 +55,7 @@ A few rules matter specifically when an AI tool makes the change, on top of the - **Confirm state-changing actions.** Get the user's explicit confirmation before pushing, opening or updating a PR, or posting a comment. Being asked to start a task is not standing authorization for these outward-facing actions — see *Licensing and provenance* above. -- **A clean local run is weak evidence for concurrency/visibility fixes** (timing- and +- **A clean local run is weak evidence for concurrency/data race fixes** (timing- and platform-dependent). See [`CODING.md`](CODING.md#reproducing-concurrency--memory-visibility-bugs). ## Critical rules diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 4033d6aecd4e6..4e0fb488dbaf8 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -2,7 +2,9 @@ Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, -networking). The authoritative documentation lives at — see the +networking). + +The authoritative documentation lives at — see the [Architecture Overview](https://pulsar.apache.org/docs/4.2.x/concepts-architecture-overview/) for the conceptual model. For a deeper, generated architecture description see [DeepWiki](https://deepwiki.com/apache/pulsar); coding agents can install the @@ -46,6 +48,25 @@ accordingly: `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles server/shell/offloader tarballs. +## Build and module layout + +The Gradle build infrastructure — convention plugins under `build-logic/`, the +`gradle/libs.versions.toml` version catalog, the enforced `pulsar-dependencies` platform, the +dependency tiers in `settings.gradle.kts`, and the configuration-cache / configure-on-demand +requirements — is documented in [`BUILDING.md`](BUILDING.md), together with the +[module-name-vs-directory gotcha](BUILDING.md#module-name-vs-directory-name-gotcha) (e.g. directory +`pulsar-client/` is the Gradle project `:pulsar-client-original`). + +## Pulsar Improvement Proposals (`pip/`) + +The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design +documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. +PIP-463 = Maven→Gradle migration, PIP-465 = IO connectors moved out, PIP-466/468 = V5 client). +`pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the +relevant PIP for the rationale behind a non-trivial feature or architectural decision. A PIP **number +is reserved by the first `dev@pulsar.apache.org` thread that uses it** — start the discussion to claim +the next free number. + ## Concurrency model (a known gap) Pulsar does **not** have a clearly established, documented concurrency model, which makes it hard to @@ -76,22 +97,3 @@ than it can handle, particularly with respect to memory. The memory side is desc [PIP-442 "Existing Broker Memory Management"](pip/pip-442.md#existing-broker-memory-management). Broader backpressure (beyond memory) is not yet documented and would benefit from being defined alongside the concurrency model. - -## Build and module layout - -The Gradle build infrastructure — convention plugins under `build-logic/`, the -`gradle/libs.versions.toml` version catalog, the enforced `pulsar-dependencies` platform, the -dependency tiers in `settings.gradle.kts`, and the configuration-cache / configure-on-demand -requirements — is documented in [`BUILDING.md`](BUILDING.md), together with the -[module-name-vs-directory gotcha](BUILDING.md#module-name-vs-directory-name-gotcha) (e.g. directory -`pulsar-client/` is the Gradle project `:pulsar-client-original`). - -## Pulsar Improvement Proposals (`pip/`) - -The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design -documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. -PIP-463 = Maven→Gradle migration, PIP-465 = IO connectors moved out, PIP-466/468 = V5 client). -`pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the -relevant PIP for the rationale behind a non-trivial feature or architectural decision. A PIP **number -is reserved by the first `dev@pulsar.apache.org` thread that uses it** — start the discussion to claim -the next free number. From 79505bfee7551ee6284864fc9a53925519baa3a7 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 16:23:29 +0300 Subject: [PATCH 31/39] [improve][misc] BUILDING.md: broaden the CONTRIBUTING.md reference to the full test/PR/CI workflow --- BUILDING.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/BUILDING.md b/BUILDING.md index 4f003f4a8517b..2ac394fd34a4f 100644 --- a/BUILDING.md +++ b/BUILDING.md @@ -4,8 +4,10 @@ Apache Pulsar uses a **Gradle** build (migrated from Maven via PIP-463; some old elsewhere still reference Maven). Use the bundled wrapper `./gradlew` (Linux/macOS) or `gradlew.bat` (Windows) — no separate Gradle install is needed. For local-environment setup see the [build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and the -[IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). For how to *run tests* see -[`CONTRIBUTING.md` → Running tests](CONTRIBUTING.md#running-tests); for the module map see +[IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). For running tests and the rest of +the contribution workflow — test groups, integration tests, Personal CI, and PR conventions — see +[`CONTRIBUTING.md`](CONTRIBUTING.md) (in particular +[Running tests](CONTRIBUTING.md#running-tests)); for the module map see [`ARCHITECTURE.md`](ARCHITECTURE.md). ## Prerequisites From 4ff3fa3bdd2e604ed63c4d56bb2d959942ed2976 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 16:32:19 +0300 Subject: [PATCH 32/39] [improve][misc] Reorder CODING.md (logging/testing first, concurrency/JMM last) and tighten prose --- CODING.md | 495 ++++++++++++++++++++++-------------------------------- 1 file changed, 198 insertions(+), 297 deletions(-) diff --git a/CODING.md b/CODING.md index 35b6a07022ef7..7cdc1a9fa2523 100644 --- a/CODING.md +++ b/CODING.md @@ -1,344 +1,245 @@ # Coding guidelines Apache Pulsar follows the Sun Java Coding Conventions with additional project-specific rules. The -codebase is performance-critical, asynchronous, and concurrency-sensitive, so code review -prioritizes **correctness, thread safety, performance, maintainability, and backward -compatibility**. This file is the canonical coding reference for both human contributors and AI -coding agents; see [`AGENTS.md`](AGENTS.md) for the agent-specific guardrails that apply on top of it. +codebase is performance-critical, asynchronous, and concurrency-sensitive, so code review prioritizes +**correctness, thread safety, performance, maintainability, and backward compatibility**. This file is +the canonical coding reference for human contributors and AI coding agents; see [`AGENTS.md`](AGENTS.md) +for the agent-specific guardrails on top of it. ## Style - **4 spaces** for indentation; **tabs must never be used**. - Always use **curly braces**, even for single-line `if` statements. -- Do **not** include `@author` tags in Javadoc. +- No `@author` tags in Javadoc. - Every `TODO` must reference a GitHub issue, e.g. `// TODO: https://github.com/apache/pulsar/issues/XXXX`. - Checkstyle config: `buildtools/src/main/resources/pulsar/checkstyle.xml`. Lombok is enabled. -## Data types +## Logging -- **Don't return generic tuples.** Instead of returning `org.apache.commons.lang3.tuple.Pair` - (or a similar generic tuple type), define a small, purpose-named **Java `record`** inline in the - class that declares the method, with field names that document what the values mean. Give the record - the **same visibility as the method** that returns it — `public`, package-private (default), or - `private`. -- **Prefer record keys over concatenated strings.** For a composite `Map` key, use a small `record` - holding the key components instead of concatenating a `java.lang.String` (e.g. `a + ":" + b`). - A record key gives a correct `equals`/`hashCode`, keeps the key type-safe, and avoids - delimiter/escaping bugs. -- **Use the narrowest interface type** for fields, parameters, variables, and return types — `java.util.Map`, - `SequencedMap`, `SortedMap`, `Collection`, `List` — rather than a concrete implementation such as - `TreeMap`. Keep the concrete type only where its behaviour is actually required (e.g. instantiate a - `TreeMap` for stable key-ordered iteration) and still expose it through the interface type. -- **Minimize method and constructor parameters.** Don't pass values that are already reachable from a - context object a method receives — read them from it (e.g. from `PublishContext`) instead of adding - redundant parameters. For a constructor (or factory) with many parameters, use a **builder**: the - project uses Lombok-generated builders (`@Builder`) for most internal classes. `@Builder` also works - on a Java `record`, which pairs well with preferring records (above) — a record plus `@Builder` is a - clean way to pass many values without a long parameter list. **Don't use Lombok - `@Builder` on public client-API classes** — it's harder to maintain and its default builder class - name carries no meaningful context; hand-write the builder there, or if Lombok is used set an - explicit, meaningful `builderClassName`. -- **Name things for intent.** Name a boolean-returning method as a query (`shouldSkipChunk`, not - `skipChunk`); name methods, fields, and metrics after the effect they describe rather than vague - terms (prefer e.g. `truncated` / `migrationSegment` over `overflow` / `special`). -- **Construct via factory methods, not internal implementation types.** Use the provided factory - (e.g. `PositionFactory.create(...)`) instead of referencing an internal class such as - `ImmutablePositionImpl` directly. +- Prefer **[slog](https://github.com/merlimat/slog)** (`io.github.merlimat.slog`) via Lombok's + **`@CustomLog`** (wired in `lombok.config` to `Logger.get(TYPE)`). **SLF4J is deprecated** for new + code; never use `System.out` / `System.err`. +- **Default new logs to `TRACE`/`DEBUG`, not `INFO`** — Pulsar overuses `INFO` and floods production + logs. Reserve `INFO` for low-frequency lifecycle/state-change events. +- Attach data as **structured attributes** — `log.info().attr("topic", topic).log("Published")` — not + interpolated into the message string. +- For expensive `DEBUG`/`TRACE` values, don't guard with `isDebugEnabled()`/`isTraceEnabled()`; use + slog's lazy form — `log.debug().attr("dump", () -> expensiveDump()).log("...")` or + `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. +- Avoid logging on hot paths, and stack traces at `INFO` or lower. ## Asynchronous programming -Pulsar relies heavily on `CompletableFuture` and asynchronous pipelines. Prefer `CompletableFuture` -APIs over `ListenableFuture` for new code. +Pulsar relies heavily on `CompletableFuture`; prefer it over `ListenableFuture` for new code. -- **Methods returning `CompletableFuture` must not throw synchronously.** Propagate failures through - the returned future — `return CompletableFuture.failedFuture(e);` — including for argument - validation: +- **A method returning `CompletableFuture` must not throw synchronously.** Propagate failures through + the returned future — `return CompletableFuture.failedFuture(e);` — including for argument validation + (`if (arg == null) return CompletableFuture.failedFuture(new IllegalArgumentException("arg"));`). + Throwing *inside* a stage (`thenApply`, `thenCompose`, `handle`, `whenComplete`, …) is fine. +- **Never block on event-loop / async-execution threads** — no `Thread.sleep`, `Future.get()`, + `CompletableFuture.join()`, or blocking IO. An operation that performs IO should return a future. +- **Avoid nested futures** (`CompletableFuture>`); flatten with `thenCompose`. + Prefer **`OrderedExecutor`** for ordered asynchronous work. +- **Converting a synchronous-throwing method to a failed future is not mechanical** — some callers rely + on the throw happening *before* the async work starts, so evaluate each call site. Use a shared + `checkArgumentAsync` helper (in `FutureUtil`) to validate without duplicating try/catch. - ```java - if (arg == null) { - return CompletableFuture.failedFuture(new IllegalArgumentException("arg")); - } - ``` +## Testing conventions - Throwing inside an async stage (`thenApply`, `thenCompose`, `thenRun`, `handle`, `whenComplete`) - is fine, because the framework routes it into the resulting future. -- **Never block on event-loop / async-execution threads** — no `Thread.sleep`, `Future.get()`, - `CompletableFuture.join()`, or blocking IO. -- **Avoid nested futures** such as `CompletableFuture>`; flatten with - `thenCompose`. -- Prefer **`OrderedExecutor`** for ordered asynchronous work. -- If an operation performs IO (creating authentication, starting a transaction, etc.), it should - return a `CompletableFuture` rather than block. -- **Converting an existing synchronous-throwing method to return a failed future is not a mechanical - change** — evaluate each call site first. Some callers historically rely on the exception being - thrown *before* the async work starts, so a blanket conversion can change behaviour and introduce - instability in untested error paths. Use a shared helper (e.g. a `checkArgumentAsync` in `FutureUtil`) - to validate arguments without duplicating the try/catch in every method. +Most Pulsar **"unit tests"** (`src/test`, run with `./gradlew ::test`) are actually +**integration-style** — they start a real in-JVM broker (`MockedPulsarServiceBaseTest` / +`pulsarTestContext`) rather than testing a class in isolation. The **container integration tests** +under `tests/` run against a Pulsar Docker image (see +[`CONTRIBUTING.md`](CONTRIBUTING.md#integration-tests)). Ideally code is factored so genuine units *can* +be unit-tested in isolation with light mocking — excessive mocking is a design smell, not the goal — +but much existing code isn't, so integration-style is the pragmatic default. See +[`CONTRIBUTING.md`](CONTRIBUTING.md) for how to *run* tests (groups, `--tests` scoping, retry count). + +- **TestNG + Mockito.** Prefer **AssertJ** assertions (with descriptions) over TestNG asserts; use + **Awaitility** for async conditions instead of `sleep` timing, with timeouts to prevent hangs. + `untilAsserted(...)` retries assertions, `until(...)` waits for a boolean — don't swap them. Verify + async interactions with Mockito `timeout(...)`, not fixed sleeps. +- Every feature or bug fix needs **deterministic** tests for edge and failure cases. A bug-fix test + must **fail on the unpatched code for the real reason** — not because it forces internal state. +- For code not factored for isolation, prefer an integration-style test over mocking a web of + collaborators: inject faults via the test infrastructure (e.g. + `pulsarTestContext.getMockBookKeeper().setReadHandleInterceptor(...)`) and assert on logs with + `TestLogAppender`. It's fine to add a **clean new test class** rather than extend an awkward one. +- **No reflection into private state** (`WhiteboxImpl.getInternalState`/`setInternalState`, + `setAccessible(true)`). Expose a **package-private `@VisibleForTesting`** accessor and put the test in + the same package; flag new reflection in review ([dev@ rationale](https://lists.apache.org/thread/7gr04sqmzyttx4ln6ydtp3qv0xgo1o6m)). +- **New integration-style tests: extend `SharedPulsarBaseTest`.** It shares one `SharedPulsarCluster` + for the test-JVM lifecycle (`admin` / `pulsarClient` are per test class); each method gets its own + namespace. Use `getNamespace()` and `newTopicName()` — never hardcode namespace/topic names, since + the runtime is shared. +- **Close/release what the test allocates.** A **`ByteBuf`/buffer leak** (pooled-allocator detection, + `-Dpulsar.allocator.pooled=true`) is a **real bug** — fix the missing `release()`. A **thread leak + from `ThreadLeakDetectorListener` is unreliable** (high false-positive rate, notably with + `SharedPulsarBaseTest` and when `THREAD_LEAK_DETECTOR_WAIT_MILLIS` is too low — ≈`10000` recommended, + only effective with the Gradle daemon disabled, `--no-daemon`); corroborate before treating it as + real. +- **Validate performance optimizations with a JMH benchmark** under `microbench/`, simulating a + realistic production usage pattern (see `microbench/README.md`). -## Concurrency +## Data types -- Public classes should be **thread-safe**; annotate non-thread-safe ones with `@NotThreadSafe`. -- Protect shared mutable state; prefer fine-grained synchronization; ensure mutations occur on the - intended thread. Prefer the **single-writer principle** — design so a given piece of state is - mutated by only one thread — to avoid concurrent mutation entirely. -- **Minimize work while holding a lock.** Capture the state you need into local variables inside the - synchronized block, then run callbacks, listeners, and IO *outside* it. Never call out to - listener/callback code while holding a lock — narrowing lock scope has fixed real deadlocks and - contention in Pulsar. -- Give threads **meaningful names** for diagnostics. -- When creating thread pools, prefer Netty's **`io.netty.util.concurrent.DefaultThreadFactory`**. It - produces **`FastThreadLocalThread`** instances, on which `FastThreadLocal` lookups are much faster - than on a plain `Thread` — this matters on Netty-heavy paths (e.g. the pooled `ByteBuf` allocator and - other Netty internals that rely on `FastThreadLocal`) — and it also assigns meaningful, prefixed - thread names. - -Note that Pulsar does not yet have a documented, project-wide concurrency model; see -[`ARCHITECTURE.md` → Concurrency model](ARCHITECTURE.md#concurrency-model-a-known-gap) for the -conventions that *should* govern threads, thread pools, and event loops. +- **Don't return generic tuples.** Instead of `org.apache.commons.lang3.tuple.Pair` (or a similar + tuple type), define a small, purpose-named **Java `record`** inline in the class that declares the + method, with the **same visibility as the method** (`public`, package-private, or `private`). +- **Prefer record keys over concatenated strings.** For a composite `Map` key, use a small `record` + instead of concatenating a `String` (e.g. `a + ":" + b`) — correct `equals`/`hashCode`, type-safe, + no delimiter/escaping bugs. +- **Use the narrowest interface type** for fields, parameters, variables, and returns (`Map`, + `SequencedMap`, `SortedMap`, `Collection`, `List`) rather than a concrete type like `TreeMap`. Keep + the concrete type only where its behaviour is required (e.g. a `TreeMap` for key-ordered iteration), + still exposed through the interface. +- **Minimize method and constructor parameters.** Don't pass values already reachable from a context + object the method receives (e.g. read from `PublishContext`). For a constructor with many parameters, + use a **builder** — the project uses Lombok `@Builder` for most internal classes, and it works on a + `record` too. **Don't use `@Builder` on public client-API classes** (hard to maintain, meaningless + default builder name) — hand-write the builder or set an explicit `builderClassName`. +- **Name for intent.** Name a boolean-returning method as a query (`shouldSkipChunk`, not `skipChunk`); + name methods/fields/metrics after the effect (prefer `truncated` / `migrationSegment` over + `overflow` / `special`). +- **Construct via factory methods, not internal implementation types** (e.g. `PositionFactory.create(...)`, + not `new ImmutablePositionImpl(...)`). -### The Java Memory Model is what makes concurrent code correct +## Dependencies -Historically, several hard-to-investigate Pulsar bugs have come from misconceptions about Java -synchronization: - -- **A `synchronized` method or block is not, on its own, thread-safe.** `synchronized` provides its - visibility and ordering guarantees only when the **same monitor/lock guards both the reads and the - writes** of the shared state. Synchronizing on an arbitrary monitor while another thread accesses - the same field without that monitor guarantees nothing. -- On 64-bit JVMs a field's value is **never corrupted** — a read always returns some value that was - actually written. What goes wrong is **visibility**: without a happens-before relationship, - different threads can observe different values, or a thread may never see an updated value at all. - Establish happens-before with `synchronized`, `volatile`, `final`, or `java.util.concurrent` - constructs. -- **A field accessed by more than one thread needs explicit visibility.** If multiple threads read - and write a field, make it `volatile` (or guard every access — both reads and writes — with the - same lock); otherwise a writing thread's update may never become visible to a reading thread. - `volatile` gives visibility for that single field but does **not** make compound updates - (read-modify-write, check-then-act) atomic — use `java.util.concurrent` atomics/locks for those. -- Visibility is per-field, so a mutable object can be observed **partially updated** — some field - writes visible to a reader, others not. -- The only way to make concurrent code reliably correct is to **conform to the Java Memory Model**. - **Benign data races** are sometimes acceptable, and Pulsar has code that relies on this by design — - but that must be a deliberate, documented choice, not an accident. -- **Prefer immutable objects** to sidestep these visibility and mutation hazards. Two distinct notions: - - An object is **immutable** when all its fields are `final` *and* every nested instance it - references is itself immutable (a Java `record` is the common case). Immutability must hold for the - whole reachable graph — synchronization around a holder buys nothing if a referenced object can - still be mutated independently without its own synchronization. - - An object is **effectively immutable** when its state is never modified after construction but its - fields are not all `final`. -- **How they must be published differs:** - - An **immutable** object benefits from the JMM's final-field **safe initialization** guarantee — a - thread that observes the reference sees the fully-constructed state even when the reference was - published through a data race — so it needs **no** safe publication. - - An **effectively immutable** object does **not** get that guarantee and must be shared via **safe - publication**: a `final` or `volatile` field, or a `java.util.concurrent` construct (e.g. putting - it into a `ConcurrentHashMap`). - - See [Safe initialization](https://shipilev.net/blog/2014/safe-public-construction/#_safe_initialization). +Prefer existing dependencies over new libraries. Pulsar commonly uses Apache Commons / Guava +(utilities), **FastUtil** (type-specific collections), **JCTools** (concurrent structures), +**RoaringBitmap** (compressed bitsets), **Caffeine** (caching), **Jackson** (JSON), Prometheus / +**OpenTelemetry** (metrics), and **Netty** (networking and buffers). -### Reproducing concurrency / memory-visibility bugs +A new dependency must be justified (why existing ones are insufficient) and must update the +bundled-dependency `LICENSE`/`NOTICE` — verify with `./gradlew checkBinaryLicense`. -These bugs are timing- and platform-dependent and easily masked, so a clean run is weak evidence that -a concurrency fix is correct: +## Backward compatibility -- Interpreted and JIT-compiled code can behave very differently. Reproductions often need several - **warm-up rounds with a short pause** between them so the JIT compiler kicks in; the JIT is tiered, - asynchronous, and multi-threshold, so a short-running test may never trigger compilation. JVM flags - can force earlier compilation, and which execution paths are exercised affects what gets compiled. -- Some races surface only on specific **hardware/OS** combinations — classically **multi-socket / - multi-NUMA** machines, whose weaker cross-socket memory ordering exposes races that never appear on - a single-socket machine. +Pulsar maintains strong compatibility guarantees. Changes must not break public APIs, client +compatibility, wire-protocol compatibility, or serialized/metadata formats — servers must work with +both older and newer clients. Flag any change that may break compatibility. -## Logging +**Plugin / SPI extension points are public API.** Many interfaces are selected by a `*ClassName` +configuration setting — e.g. `LoadManager`, `LedgerOffloaderFactory`, `AuthorizationProvider` / +`AuthenticationProvider`, `EntryFilter`, `TopicFactory`, `BrokerInterceptor`, dispatcher / +delayed-delivery-tracker factories, `CustomCommand` — and third parties ship implementations. Changing +such an interface, or a `protected` member of an extensible class (`PulsarWebResource`, +`PersistentTopic`, `Producer`), breaks them: it generally needs a PIP and must not land in +maintenance-branch backports. When you must add a method, give it a **`default` implementation** (e.g. +throwing `UnsupportedOperationException`) so existing implementors still compile and the change stays +backport-safe. + +**Don't leak third-party types through public/plugin interfaces.** Exposing Netty or AsyncHttpClient +classes breaks consumers of the **shaded** client (shaded vs. unshaded classes differ) and couples +callers to the implementation — provide a Pulsar-owned abstraction. Changing a documented behaviour or +guarantee (e.g. PIP-68 exclusive-producer guarantees, default rate-limiter behaviour) needs a PIP and a +dev@ discussion, not just a code change. -- Prefer the **[slog](https://github.com/merlimat/slog)** library (`io.github.merlimat.slog`). - Instantiate the logger with Lombok's **`@CustomLog`** annotation, which is wired in `lombok.config` - to create an `io.github.merlimat.slog.Logger` (`io.github.merlimat.slog.Logger.get(TYPE)`). -- **SLF4J** is still available but is **considered deprecated** for Pulsar logging — do not introduce - new SLF4J loggers. -- Never use `System.out` / `System.err`. -- **Default new log statements to `TRACE` or `DEBUG`, not `INFO`.** Existing Pulsar code overuses - `INFO`, which makes production logs excessively noisy. Use `TRACE` for fine-grained/low-level - detail, `DEBUG` for diagnostics that could reasonably be enabled in production without flooding the - logs, and reserve `INFO` for low-frequency, significant lifecycle/state-change events. -- Attach data as **structured attributes** — `log.info().attr("topic", topic).log("Published")` — - rather than interpolating it into the message string. -- For expensive `DEBUG`/`TRACE` values, do **not** guard with `isDebugEnabled()` / `isTraceEnabled()`. - Use slog's lazy evaluation, which only runs when the level is enabled: - - pass a supplier lambda as the attribute value — `log.debug().attr("dump", () -> expensiveDump()).log("...")`, or - - use the deferred event form — `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. -- Avoid logging in hot paths and stack traces at `INFO` or lower. +**Introduce changes behind a backward-compatible default.** Make new/changed behaviour opt-in via +configuration rather than silently changing existing deployments. Behaviour that risks data loss (e.g. +skipping unrecoverable data) must be gated behind an explicit flag (such as `autoSkipNonRecoverableData`), +defaulting to the safe/old behaviour. ## Resource and memory management -- Always close resources: streams, network connections, executors, buffers. Prefer - try-with-resources. -- For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an - external API requires `ByteBuffer`. Release ref-counted buffers you allocate. -- **Don't hand-optimize object allocation away.** Pulsar is intended to run on **ZGC**, whose - collection overhead is very low, so the extra short-lived allocations from favouring immutable - objects (see *Concurrency* above) are cheap. Much older Pulsar code minimizes allocation by pooling - objects with Netty's `Recycler`; this is **no longer recommended for new code** — under ZGC the - `Recycler` often *costs* more CPU than it saves versus simply letting the GC reclaim the objects. - Do not introduce new `Recycler` usage. See - [PIP-443: Stop using Netty Recycler in new code](pip/pip-443.md). +- Always close resources (streams, connections, executors, buffers) — prefer try-with-resources. +- On internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an + external API requires it; release ref-counted buffers you allocate. +- **Don't hand-optimize allocation away.** Pulsar runs on **ZGC** (very low collection overhead), so + the extra short-lived allocations from favouring immutable objects (see *Concurrency* below) are + cheap. Older code pools objects with Netty's `Recycler`; this is **no longer recommended for new + code** — under ZGC the `Recycler` often *costs* more CPU than it saves. Don't add new `Recycler` + usage. See [PIP-443](pip/pip-443.md). ## Performance -- **Back optimizations with evidence** — a JMH benchmark (see *Testing conventions*) or a profile — - not intuition; and measure on JIT-warmed code (see *Reproducing concurrency / memory-visibility - bugs* for why warm-up matters). -- **On hot paths** (dispatch, IO, per-message code): avoid `String.format` (build the string directly), - avoid `Enum.values()` (match values explicitly), and avoid unnecessary allocation and locking. Prefer - lock-free or single-writer designs over synchronization where practical. -- **Don't add overhead to an already-overloaded system.** Avoid doing work and then discarding it - (e.g. reading entries from BookKeeper only to drop them before dispatch); extra work under high load - causes cascading failures. Acquire or estimate up front and reconcile afterwards instead. -- **Bound in-memory caches** with a size or byte limit plus eviction, and de-duplicate repeated - `String`s (cluster / tenant / namespace ids) with `org.apache.pulsar.common.util.StringInterner` to - avoid heap duplication. -- Don't pre-emptively micro-optimize elsewhere — favour clear, correct code, and let ZGC handle - short-lived allocations (see *Resource and memory management*). +- **Back optimizations with evidence** — a JMH benchmark (see *Testing conventions*) or a profile, not + intuition — measured on JIT-warmed code (see *Reproducing concurrency / memory-visibility bugs*). +- **On hot paths** (dispatch, IO, per-message): avoid `String.format` (build strings directly), + `Enum.values()` (match explicitly), and unnecessary allocation/locking; prefer lock-free or + single-writer designs. +- **Don't add overhead to an already-overloaded system.** Avoid doing work then discarding it (e.g. + reading entries only to drop them before dispatch) — extra work under load causes cascading failures; + acquire/estimate up front and reconcile afterwards. +- **Bound in-memory caches** (size or byte limit + eviction) and de-duplicate repeated `String`s + (cluster/tenant/namespace ids) with `org.apache.pulsar.common.util.StringInterner`. ## Configuration -When adding configuration options: use clear, descriptive names; provide sensible defaults; update -the default configuration files; and document the option. +When adding configuration options: use clear, descriptive names; provide sensible defaults; update the +default configuration files; and document the option. -## Dependencies - -Prefer existing dependencies over introducing new libraries. Pulsar commonly uses Apache -Commons / Guava (utilities), **FastUtil** (type-specific collections), **JCTools** (concurrent data -structures), **RoaringBitmap** (compressed bitsets), **Caffeine** (caching), **Jackson** (JSON), -Prometheus / **OpenTelemetry** (metrics), and **Netty** (networking and buffers). - -A new dependency must be justified (explain why existing ones are insufficient) and must update the -bundled-dependency `LICENSE`/`NOTICE` — verify with `./gradlew checkBinaryLicense`. +## Code review checklist -## Backward compatibility +When reviewing a PR, verify: -Pulsar maintains strong compatibility guarantees. Changes must not break public APIs, client -compatibility, wire-protocol compatibility, or serialized/metadata formats. Servers must work with -both older and newer clients. Flag any change that may break compatibility. +- Java coding conventions followed; logging follows the guidelines above (slog, levels, structured + attributes). +- Thread-safety risks; no blocking in async paths; correct `CompletableFuture` usage. +- No unnecessary dependencies; LICENSE/NOTICE updated when dependencies change. +- Backward compatibility preserved. +- Tests exist and are appropriate; reflection into private state is flagged with a `@VisibleForTesting` + accessor suggested instead. +- The **PR description explains the change** — at minimum **Motivation (why?)** and **Modifications + (what/how?)**, matching `.github/PULL_REQUEST_TEMPLATE.md`; a title alone isn't sufficient. -**Plugin / SPI extension points are public API.** Pulsar has many pluggable interfaces selected by a -`*ClassName` configuration setting — e.g. `LoadManager`, `LedgerOffloaderFactory`, -`AuthorizationProvider` / `AuthenticationProvider`, `EntryFilter`, `TopicFactory`, `BrokerInterceptor`, -dispatcher / delayed-delivery-tracker factories, `CustomCommand`. Third parties ship implementations of -these in production. Changing such an interface — or a `protected` member of a class meant for -extension, such as `PulsarWebResource`, `PersistentTopic`, or `Producer` — breaks those implementations. -Treat these as public API: the change generally needs a PIP, and breaking changes must not land in -maintenance-branch backports (keep deprecated bridge methods that delegate to the new form instead). -When you must add a method to such an interface, give it a **`default` implementation** (e.g. one -throwing `UnsupportedOperationException`) so existing third-party implementors still compile — this -also keeps the change source-compatible enough to backport. - -**Don't leak third-party types through public or plugin interfaces.** Exposing library types such as -Netty or AsyncHttpClient classes in a public / plugin API breaks consumers of the **shaded** Pulsar -client (the shaded and unshaded classes differ) and couples callers to the implementation — provide a -Pulsar-owned abstraction instead. Changing a documented behaviour or guarantee (e.g. the -exclusive-producer guarantees of PIP-68, or default rate-limiter behaviour) likewise needs a PIP and a -dev@ discussion, not just a code change. +Focus feedback on correctness, reliability, and maintainability. -**Introduce changes behind a backward-compatible default.** Prefer making new or changed behaviour -opt-in via configuration rather than silently changing what existing deployments do. Behaviour that can -cause data loss (e.g. skipping unrecoverable data) must be gated behind an explicit flag (such as -`autoSkipNonRecoverableData`), defaulting to the safe/old behaviour. +## Concurrency -## Testing conventions +- Public classes should be **thread-safe**; annotate non-thread-safe ones with `@NotThreadSafe`. +- Protect shared mutable state; prefer fine-grained synchronization; mutate on the intended thread. + Prefer the **single-writer principle** (a given piece of state mutated by only one thread) to avoid + concurrent mutation entirely. +- **Minimize work while holding a lock.** Capture needed state into locals inside the synchronized + block, then run callbacks, listeners, and IO *outside* it — never call out to listener/callback code + while holding a lock (this has fixed real deadlocks and contention). +- Give threads **meaningful names**. When creating thread pools, prefer Netty's + **`io.netty.util.concurrent.DefaultThreadFactory`** — it produces **`FastThreadLocalThread`** + instances (much faster `FastThreadLocal` lookups, which matter on Netty paths like the pooled + `ByteBuf` allocator) and assigns prefixed names. + +Pulsar has no documented, project-wide concurrency model yet; see +[`ARCHITECTURE.md` → Concurrency model](ARCHITECTURE.md#concurrency-model-a-known-gap) for the +conventions that *should* govern threads, thread pools, and event loops. -Most of Pulsar's so-called **"unit tests"** (under each module's `src/test`, run with -`./gradlew ::test`) are in fact **integration-style** — they bring up a real in-JVM broker via -`MockedPulsarServiceBaseTest` / `pulsarTestContext` rather than testing a class in isolation. The -**actual integration tests** live under `tests/` and run against a Pulsar **Docker test image** with -Testcontainers (see [`CONTRIBUTING.md`](CONTRIBUTING.md#integration-tests)). "Unit test" below means -the former. - -Ideally this would not be necessary: well-designed code lets genuine **units be unit-tested in -isolation**, with collaborators behind narrow interfaces that can be substituted by simple -fakes/mocks **without excessive mocking**. Excessive mocking is a design smell (tight coupling), not a -reason to abandon unit testing. Much existing Pulsar code isn't factored this way, which is why -integration-style tests are the pragmatic default today — but when you write or refactor code, prefer -the design that makes a true, isolated unit test possible. - -- **TestNG + Mockito.** Prefer **AssertJ** assertions with descriptions over TestNG asserts; use - **Awaitility** (with AssertJ) for asynchronous conditions instead of `sleep`-based timing. Add - timeouts to prevent hangs. Use Awaitility only when polling/retrying for a condition: - `untilAsserted(...)` retries assertions until they pass, `until(...)` waits for a boolean to become - `true` — don't swap the two. -- Every feature or bug fix should include tests covering edge cases and failure scenarios; tests must - be deterministic and stable. Integration tests may be required for distributed components. -- **For code that isn't factored for isolation, prefer an integration-style test over heavy mocking.** - Reproduce the real production scenario with `MockedPulsarServiceBaseTest` / `pulsarTestContext` - rather than mocking a web of internal collaborators. - Inject faults through the test infrastructure — e.g. - `pulsarTestContext.getMockBookKeeper().setReadHandleInterceptor(...)` to simulate read failures or - delays — and capture log output with `org.apache.pulsar.utils.TestLogAppender` to assert behaviour. - A bug-fix test should **fail against the unpatched code for the actual reason**, not because the test - forces internal state. -- **For new integration-style tests, prefer `SharedPulsarBaseTest`.** It shares a single - `SharedPulsarCluster` and its components for the whole **test-JVM lifecycle** (avoiding per-test - broker startup cost), while `admin` / `pulsarClient` are shared **per test class** (initialized once - per class). Each test method gets **its own namespace** (created before the method, force-deleted - after). Get that namespace with `getNamespace()` and create topic names with `newTopicName()` — - never hardcode namespace or topic names, because the runtime is shared across all tests in the JVM. - (The shared runtime is also why `ThreadLeakDetectorListener` mis-reports leaks with this base - class — see above.) -- **Don't mutate private/internal state to force a condition.** If a test has to reach into internal - fields to trigger the path under test, it usually isn't representative — drive the real path instead. -- It's fine to **add a new, cleanly-written test class** instead of extending an existing one whose - base class or style makes the new case hard to read; existing Pulsar tests are not always good models. -- For asynchronous interactions, verify with Mockito's `timeout(...)` (e.g. - `verify(x, timeout(2000)).foo()`) rather than fixed sleeps. -- **Validate performance optimizations with a benchmark.** Prefer adding a **JMH benchmark** under the - `microbench/` subproject that simulates a usage pattern realistic for Pulsar production, so the - optimization is backed by evidence rather than intuition. See `microbench/README.md` for how to run - JMH benchmarks. -- **No reflection to access private state in tests** (`WhiteboxImpl.getInternalState` / - `setInternalState`, `Field.setAccessible(true)`, `Method.setAccessible(true)`, and similar). - Instead expose what the test needs through a **package-private** `@VisibleForTesting` accessor and - place the test in the same package: - - ```java - @VisibleForTesting - Map getSubscriptions() { - return subscriptions; - } - ``` - - New reflection-based test access should be flagged in review. See the dev@ thread - [Stop using reflection to access private fields in tests](https://lists.apache.org/thread/7gr04sqmzyttx4ln6ydtp3qv0xgo1o6m). -- **Close/release resources in tests** — shut down executors/clients/services and release Netty - `ByteBuf`s (and other reference-counted buffers) the test allocates. - - A reported **`ByteBuf`/buffer leak** (Netty pooled-allocator leak detection, enabled via - `-Dpulsar.allocator.pooled=true`) is a **real bug**: fix the root cause — the missing `release()` - in the code under test — rather than working around it in the test or suppressing the detector. - - **Thread leaks** reported by `ThreadLeakDetectorListener` are currently **not a reliable signal**. - The detector produces a high rate of false positives — notably with `SharedPulsarBaseTest`, and - whenever the `THREAD_LEAK_DETECTOR_WAIT_MILLIS` environment value is not set high enough - (≈`10000` is recommended) to let asynchronous shutdown complete before the check runs. That wait - setting only takes effect when the **Gradle daemon is disabled** (`--no-daemon`). Because of these - false positives, do not rely on `ThreadLeakDetectorListener` alone to conclude that a change is - thread-leak-free; corroborate before treating a reported thread leak as a real bug. - -See [`CONTRIBUTING.md`](CONTRIBUTING.md) for how to *run* tests (test groups, `--tests` scoping, -integration tests, retry count). +### The Java Memory Model is what makes concurrent code correct -## Code review checklist +Several hard-to-investigate Pulsar bugs have come from misconceptions about Java synchronization: + +- **A `synchronized` method or block is not, on its own, thread-safe.** It provides its + visibility/ordering guarantees only when the **same monitor/lock guards both the reads and the + writes** of the shared state. +- On 64-bit JVMs a field's value is **never corrupted** — a read returns some value that was actually + written. What breaks is **visibility**: without a happens-before relationship, threads can observe + different values, or never see an update. Establish happens-before with `synchronized`, `volatile`, + `final`, or `java.util.concurrent` constructs. +- **A field accessed by more than one thread needs explicit visibility** — make it `volatile` (or + guard every read *and* write with the same lock). `volatile` gives single-field visibility but does + **not** make compound updates (read-modify-write, check-then-act) atomic — use `java.util.concurrent` + atomics/locks for those. +- Visibility is per-field, so a mutable object can be observed **partially updated**. +- The only way to be reliably correct is to **conform to the Java Memory Model**. **Benign data races** + are sometimes acceptable, and some Pulsar code relies on this by design — but only as a deliberate, + documented choice. +- **Prefer immutable objects.** An object is **immutable** when all fields are `final` *and* every + nested instance is itself immutable (a `record` is the common case; immutability must hold for the + whole reachable graph). It is **effectively immutable** when never modified after construction but + with non-`final` fields. Publication differs: an **immutable** object benefits from the JMM's + final-field **safe initialization** (visible even when published via a data race) and needs **no** + safe publication; an **effectively immutable** one must be shared via **safe publication** (a `final` + or `volatile` field, or a `java.util.concurrent` construct such as `ConcurrentHashMap`). See + [Safe initialization](https://shipilev.net/blog/2014/safe-public-construction/#_safe_initialization). -When reviewing a pull request, verify: +### Reproducing concurrency / memory-visibility bugs -- Java coding conventions are followed. -- Thread-safety risks; no blocking operations in async paths; correct `CompletableFuture` usage. -- No unnecessary dependencies; LICENSE/NOTICE updated when dependencies change. -- Logging follows the guidelines above ([slog](https://github.com/merlimat/slog), levels, structured attributes). -- Backward compatibility is preserved. -- Tests exist and are appropriate; reflection-based access to private state is flagged with a - `@VisibleForTesting` package-private accessor suggested instead. -- The **PR description adequately explains the change** — at minimum the **Motivation (why?)** and - **Modifications (what / how?)**, matching `.github/PULL_REQUEST_TEMPLATE.md`. A title alone, or a - description that only restates the title, is not sufficient; ask for a proper description unless it - is already covered. +These bugs are timing- and platform-dependent and easily masked, so a clean run is weak evidence a fix +is correct: -Focus feedback on correctness, reliability, and maintainability. +- Interpreted and JIT-compiled code behave differently. Reproductions often need several **warm-up + rounds with a short pause** so the (tiered, asynchronous) JIT kicks in; a short test may never + trigger compilation. JVM flags can force earlier compilation, and the exercised paths affect what + gets compiled. +- Some races surface only on specific **hardware/OS** — classically **multi-socket / multi-NUMA** + machines, whose weaker cross-socket memory ordering exposes races a single socket never shows. From cdf4632ee8d1f1cd1cf4bad6b230677420fddbfc Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 16:44:04 +0300 Subject: [PATCH 33/39] [improve][misc] Remove BUILDING.md; building -> CONTRIBUTING.md, build infrastructure -> ARCHITECTURE.md Fold BUILDING.md into the existing docs to avoid a split: the build/lint commands and prerequisites now live in CONTRIBUTING.md (## Building), and the Gradle build infrastructure, module-name gotcha, and "Changing the build" guidance move to the end of ARCHITECTURE.md. Update AGENTS.md, README.md, and the .github gradle-build path-instructions to point accordingly. Also reword the interface backward-compatibility advice in CODING.md: prefer a delegating default method or a runtime capability query over a default that just throws UnsupportedOperationException. --- .../instructions/gradle-build.instructions.md | 6 +- AGENTS.md | 5 +- ARCHITECTURE.md | 69 +++++++++++-- BUILDING.md | 97 ------------------- CODING.md | 10 +- CONTRIBUTING.md | 44 +++++++-- README.md | 14 +-- 7 files changed, 113 insertions(+), 132 deletions(-) delete mode 100644 BUILDING.md diff --git a/.github/instructions/gradle-build.instructions.md b/.github/instructions/gradle-build.instructions.md index 1117973be46f2..a5d5eca5ef7ec 100644 --- a/.github/instructions/gradle-build.instructions.md +++ b/.github/instructions/gradle-build.instructions.md @@ -5,6 +5,6 @@ applyTo: "**/*.gradle.kts,**/gradle.properties,gradle/libs.versions.toml,build-l # Gradle build changes -These files configure Apache Pulsar's **Gradle** build. Follow the guidance in -[`BUILDING.md`](../../BUILDING.md) when editing or reviewing a -build file. \ No newline at end of file +These files configure Apache Pulsar's **Gradle** build. Follow the guidance in +[`ARCHITECTURE.md` → Changing the build](../../ARCHITECTURE.md#changing-the-build) (and the +surrounding *Build infrastructure* section) when editing or reviewing a build file. \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 14936f6765115..e6479cf72297c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -33,9 +33,8 @@ Apache Pulsar is licensed under the Apache License 2.0, and all contributions mu | Doc | Use for | |-----|---------| -| [`BUILDING.md`](BUILDING.md) | Building: prerequisites (JDK), build/lint commands, the Gradle build infrastructure (convention plugins, version catalog, enforced platform), the module-name-vs-directory gotcha, and how to change the build. | -| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | -| [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the concurrency model and backpressure, and the `pip/` proposals. | +| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: building (prerequisites, build/lint commands), running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | +| [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the Gradle build infrastructure and how to change build files (convention plugins, version catalog, the module-name-vs-directory gotcha), the concurrency model and backpressure, and the `pip/` proposals. | | [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging ([slog](https://github.com/merlimat/slog)), dependencies, backward compatibility, testing, and the review checklist. | | [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 4e0fb488dbaf8..f4c9ae75a9711 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -48,15 +48,6 @@ accordingly: `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles server/shell/offloader tarballs. -## Build and module layout - -The Gradle build infrastructure — convention plugins under `build-logic/`, the -`gradle/libs.versions.toml` version catalog, the enforced `pulsar-dependencies` platform, the -dependency tiers in `settings.gradle.kts`, and the configuration-cache / configure-on-demand -requirements — is documented in [`BUILDING.md`](BUILDING.md), together with the -[module-name-vs-directory gotcha](BUILDING.md#module-name-vs-directory-name-gotcha) (e.g. directory -`pulsar-client/` is the Gradle project `:pulsar-client-original`). - ## Pulsar Improvement Proposals (`pip/`) The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design @@ -97,3 +88,63 @@ than it can handle, particularly with respect to memory. The memory side is desc [PIP-442 "Existing Broker Memory Management"](pip/pip-442.md#existing-broker-memory-management). Broader backpressure (beyond memory) is not yet documented and would benefit from being defined alongside the concurrency model. + +## Build infrastructure + +Apache Pulsar uses a **Gradle** build (migrated from Maven via PIP-463; some older tooling and docs +elsewhere still reference Maven). The wrapper `./gradlew` requires **JDK 21 or 25** (bytecode targets +Java 17). See [`CONTRIBUTING.md` → Building](CONTRIBUTING.md#building) for the build and lint commands. + +- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, + higher tiers build on lower ones). +- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, + `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. Shared + compile/test/dependency config lives here — edit it here rather than duplicating across modules. +- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; + referenced as `libs.*` in build scripts). +- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every + module. + +The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and +**configure-on-demand** (`org.gradle.configureondemand=true`). + +### Module name vs. directory name gotcha + +Several Gradle project paths do **not** match their directory because the Maven artifactId is +preserved. Most importantly: + +- Directory `pulsar-client/` → project **`:pulsar-client-original`** +- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** +- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` + +Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. +Check `settings.gradle.kts` when a path is ambiguous. + +### Changing the build + +When editing `build-logic/`, `settings.gradle.kts`, a module `build.gradle.kts`, `gradle.properties`, +`gradle/libs.versions.toml`, or the `pulsar-dependencies` platform: + +- **Edit shared config in `build-logic/conventions/`**, not per-module. +- **Versions come from `gradle/libs.versions.toml`** (`libs.*` / `pulsar-dependencies`) — never + hardcode a version in a build script. +- **Keep tasks configuration-cache and configure-on-demand compatible** (both are enabled): no reading + of mutable state at execution time and no `Project` access in task actions — use `Provider` / value + sources, and verify with `--configuration-cache`. Tasks reached by the common flows (`assemble`, + `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, `checkBinaryLicense`, `docker*`) + must be compatible; one-off tooling tasks not part of those flows (e.g. `verifyTestGroups`, ad-hoc + report tasks) may be exempt. +- **Published modules must not depend on internal modules** at compile/runtime scope — the artifact + would be unresolvable from Maven Central. A module is published only when it applies + `pulsar.public-java-library-conventions`. +- **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution + `LICENSE`/`NOTICE`; justify any genuinely new dependency (see + [`CODING.md` → Dependencies](CODING.md#dependencies)). +- **Follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)** + — AI agents should read the + [AsciiDoc source](https://github.com/gradle/gradle/blob/master/platforms/documentation/docs/src/docs/userguide/best-practices/best_practices_index.adoc), + which is plain text and cheaper to parse than the rendered HTML. + +Before finishing a build change, confirm the affected task and `./gradlew help` run clean with +`--configuration-cache`, and that `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` pass +(plus `checkBinaryLicense` if a dependency changed). diff --git a/BUILDING.md b/BUILDING.md deleted file mode 100644 index 2ac394fd34a4f..0000000000000 --- a/BUILDING.md +++ /dev/null @@ -1,97 +0,0 @@ -# Building Apache Pulsar - -Apache Pulsar uses a **Gradle** build (migrated from Maven via PIP-463; some older tooling and docs -elsewhere still reference Maven). Use the bundled wrapper `./gradlew` (Linux/macOS) or `gradlew.bat` -(Windows) — no separate Gradle install is needed. For local-environment setup see the -[build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and the -[IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). For running tests and the rest of -the contribution workflow — test groups, integration tests, Personal CI, and PR conventions — see -[`CONTRIBUTING.md`](CONTRIBUTING.md) (in particular -[Running tests](CONTRIBUTING.md#running-tests)); for the module map see -[`ARCHITECTURE.md`](ARCHITECTURE.md). - -## Prerequisites - -- **JDK 21 or 25** is required to build `master`. Bytecode targets Java 17 (`options.release = 17`). - `zip` is also needed. (`-PskipJavaVersionCheck` bypasses the JDK version check.) -- The Gradle wrapper `./gradlew` pins the Gradle version — don't install Gradle separately. - -## Common build commands - -```bash -# Compile and assemble everything (or a single module) -./gradlew assemble -./gradlew :pulsar-broker:assemble - -# Lint / verify (license headers, formatting, checkstyle) — run before pushing -./gradlew rat spotlessCheck checkstyleMain checkstyleTest -./gradlew spotlessApply # auto-fix license headers/formatting - -# Verify bundled-dependency LICENSE/NOTICE coverage (run after changing a runtime dependency) -./gradlew checkBinaryLicense - -# Start a standalone Pulsar service (broker + bookie + metadata in one JVM) -bin/pulsar standalone - -# Build docker images apachepulsar/pulsar(-all):latest -./gradlew docker # or docker-all -``` - -## Build infrastructure - -The build is **Gradle**; the wrapper requires **JDK 21 or 25** to run (bytecode targets Java 17). - -- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, - higher tiers build on lower ones). -- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, - `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. This is - where shared compile/test/dependency config lives — edit conventions here rather than duplicating - config across modules. -- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; - referenced as `libs.*` in build scripts). -- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every - module. - -The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and -**configure-on-demand** (`org.gradle.configureondemand=true`). - -## Module name vs. directory name gotcha - -Several Gradle project paths do **not** match their directory because the Maven artifactId is -preserved. Most importantly: - -- Directory `pulsar-client/` → project **`:pulsar-client-original`** -- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** -- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` - -Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. -Check `settings.gradle.kts` when a path is ambiguous. - -## Changing the build - -When editing `build-logic/`, `settings.gradle.kts`, a module `build.gradle.kts`, `gradle.properties`, -`gradle/libs.versions.toml`, or the `pulsar-dependencies` platform: - -- **Edit shared config in `build-logic/conventions/`**, not per-module. -- **Versions come from `gradle/libs.versions.toml`** (`libs.*` / `pulsar-dependencies`) — never - hardcode a version in a build script. -- **Keep tasks configuration-cache and configure-on-demand compatible** (both are enabled): no - reading of mutable state at execution time and no `Project` access in task actions — use - `Provider` / value sources instead, and verify with `--configuration-cache`. Tasks reached by the - common flows (`assemble`, `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, - `checkBinaryLicense`, `docker*`) must be compatible; one-off tooling tasks not part of those flows - (e.g. `verifyTestGroups`, ad-hoc maintenance/report tasks) may be exempt. -- **Published modules must not depend on internal modules** at compile/runtime scope — the artifact - would be unresolvable from Maven Central. A module is published only when it applies - `pulsar.public-java-library-conventions`. -- **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution - `LICENSE`/`NOTICE`; justify any genuinely new dependency (see - [`CODING.md` → Dependencies](CODING.md#dependencies)). -- **Follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)** - — AI agents should instead read the - [AsciiDoc source](https://github.com/gradle/gradle/blob/master/platforms/documentation/docs/src/docs/userguide/best-practices/best_practices_index.adoc), - which is plain text and cheaper to parse than the rendered HTML page. - -Before finishing a build change, confirm the affected task and `./gradlew help` run clean with -`--configuration-cache`, and that `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` -pass (plus `checkBinaryLicense` if a dependency changed). diff --git a/CODING.md b/CODING.md index 7cdc1a9fa2523..c6f659a7750db 100644 --- a/CODING.md +++ b/CODING.md @@ -126,9 +126,13 @@ configuration setting — e.g. `LoadManager`, `LedgerOffloaderFactory`, `Authori delayed-delivery-tracker factories, `CustomCommand` — and third parties ship implementations. Changing such an interface, or a `protected` member of an extensible class (`PulsarWebResource`, `PersistentTopic`, `Producer`), breaks them: it generally needs a PIP and must not land in -maintenance-branch backports. When you must add a method, give it a **`default` implementation** (e.g. -throwing `UnsupportedOperationException`) so existing implementors still compile and the change stays -backport-safe. +maintenance-branch backports. + +**Design interface changes for backward compatibility.** When you add a method to such an interface, +prefer a `default` implementation that delegates to an existing method, so older third-party +implementations keep working unchanged. If no sensible delegation exists, add a separate +capability-query method (e.g. `boolean supportsX()`) the broker checks at runtime, so it can support +older implementations gracefully instead of depending on the new method. **Don't leak third-party types through public/plugin interfaces.** Exposing Netty or AsyncHttpClient classes breaks consumers of the **shaded** client (shaded vs. unshaded classes differ) and couples diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index ecd8f5b55ed97..db70206a1f71b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -24,15 +24,39 @@ We would love for you to contribute to Apache Pulsar and make it even better! The **authoritative** contributor guide is the [Apache Pulsar Contributing Guide](https://pulsar.apache.org/contribute/) — please read it before starting. This file is a quick, in-repo reference for the local development -workflow (test, PR, CI). For building see [`BUILDING.md`](BUILDING.md); for the big-picture module map -see [`ARCHITECTURE.md`](ARCHITECTURE.md); and for code style see [`CODING.md`](CODING.md). +workflow (build, test, PR, CI). For the big-picture module map and the Gradle build infrastructure see +[`ARCHITECTURE.md`](ARCHITECTURE.md); for code style see [`CODING.md`](CODING.md). ## Building -See [`BUILDING.md`](BUILDING.md) for prerequisites (JDK 21/25), the common build and lint commands -(`./gradlew assemble`, `rat spotlessCheck checkstyleMain checkstyleTest`, `checkBinaryLicense`, …), -the Gradle build infrastructure, and how to change the build. The rest of this guide covers the test -and PR workflow. +**JDK 21 or 25** is required to build `master` (bytecode targets Java 17; `-PskipJavaVersionCheck` +bypasses the check); `zip` is also needed. Use the bundled wrapper `./gradlew` (Linux/macOS) or +`gradlew.bat` (Windows) — no separate Gradle install. See the +[build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and the +[IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). + +```bash +# Compile and assemble everything (or a single module) +./gradlew assemble +./gradlew :pulsar-broker:assemble + +# Lint / verify (license headers, formatting, checkstyle) — run before pushing +./gradlew rat spotlessCheck checkstyleMain checkstyleTest +./gradlew spotlessApply # auto-fix license headers/formatting + +# Verify bundled-dependency LICENSE/NOTICE coverage (after changing a runtime dependency) +./gradlew checkBinaryLicense + +# Start a standalone Pulsar service (broker + bookie + metadata in one JVM) +bin/pulsar standalone + +# Build docker images apachepulsar/pulsar(-all):latest +./gradlew docker # or docker-all +``` + +For the Gradle build infrastructure and how to change build files (convention plugins, version +catalog, configuration-cache rules), see +[`ARCHITECTURE.md` → Build infrastructure](ARCHITECTURE.md#build-infrastructure). ## Running tests @@ -51,7 +75,7 @@ The **container-based integration tests** that run against a Pulsar Docker image ./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.admin.*" ``` -> Note the [module-name-vs-directory gotcha](BUILDING.md#module-name-vs-directory-name-gotcha): +> Note the [module-name-vs-directory gotcha](ARCHITECTURE.md#module-name-vs-directory-name-gotcha): > directory `pulsar-client/` is the Gradle project `:pulsar-client-original`. ### Test groups (TestNG) @@ -205,6 +229,6 @@ PRs/issues first, then ask via a GitHub issue or dev@pulsar.apache.org. ## AI coding agents If you use an AI coding assistant (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, …), see -[`AGENTS.md`](AGENTS.md) for the agent-facing guidance — a routing index into [`BUILDING.md`](BUILDING.md), -this guide, [`ARCHITECTURE.md`](ARCHITECTURE.md), [`CODING.md`](CODING.md), and [`SECURITY.md`](SECURITY.md), -plus the guardrails that apply specifically to AI-made changes. +[`AGENTS.md`](AGENTS.md) for the agent-facing guidance — a routing index into this guide, +[`ARCHITECTURE.md`](ARCHITECTURE.md), [`CODING.md`](CODING.md), and [`SECURITY.md`](SECURITY.md), plus +the guardrails that apply specifically to AI-made changes. diff --git a/README.md b/README.md index 7e5406068b6d0..dae2d2fe26327 100644 --- a/README.md +++ b/README.md @@ -162,8 +162,9 @@ Docker image Java runtime: 17 ## Build Pulsar -The quick start below covers the essentials; see [`BUILDING.md`](BUILDING.md) for the full build -guide (prerequisites, build/lint commands, the Gradle build infrastructure, and how to change the build). +The quick start below covers the essentials; see [`CONTRIBUTING.md` → Building](CONTRIBUTING.md#building) +for the full build and lint commands, and [`ARCHITECTURE.md` → Build infrastructure](ARCHITECTURE.md#build-infrastructure) +for the Gradle build infrastructure and how to change build files. ### Requirements @@ -196,11 +197,10 @@ There is also a guide for [setting up the tooling for building Pulsar](https://p bin/pulsar standalone # run a standalone service ``` -For the full build, lint, and dependency workflow — including the Gradle build infrastructure — see -[`BUILDING.md`](BUILDING.md). For the test and PR workflow — test groups, integration tests, Personal -CI, and PR conventions — see [`CONTRIBUTING.md`](CONTRIBUTING.md). For the module map see -[`ARCHITECTURE.md`](ARCHITECTURE.md), and for coding conventions see [`CODING.md`](CODING.md). AI coding -agents should start from [`AGENTS.md`](AGENTS.md). +For the full build, lint, test, and PR workflow — test groups, integration tests, Personal CI, and PR +conventions — see [`CONTRIBUTING.md`](CONTRIBUTING.md). For the module map and the Gradle build +infrastructure see [`ARCHITECTURE.md`](ARCHITECTURE.md), and for coding conventions see +[`CODING.md`](CODING.md). AI coding agents should start from [`AGENTS.md`](AGENTS.md). Check https://pulsar.apache.org for documentation and examples. From a06b8f4830729731bef324c7d8ad610cea72e248 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 16:56:39 +0300 Subject: [PATCH 34/39] [improve][misc] Refine CODING.md method-naming and general recommendations --- CODING.md | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/CODING.md b/CODING.md index c6f659a7750db..58cebb4380aa8 100644 --- a/CODING.md +++ b/CODING.md @@ -81,29 +81,27 @@ but much existing code isn't, so integration-style is the pragmatic default. See - **Validate performance optimizations with a JMH benchmark** under `microbench/`, simulating a realistic production usage pattern (see `microbench/README.md`). -## Data types +## General recommendations +- **Use the narrowest interface type** for fields, parameters, variables, and returns (`Map`, + `SequencedMap`, `SortedMap`, `Collection`, `List`) rather than a concrete type like `TreeMap`. Keep + the concrete type only where its behaviour is required (e.g. a `TreeMap` for key-ordered iteration), + still exposed through the interface. +- **Minimize method and constructor parameters.** For a constructor with many parameters, + use a **builder** — the project uses Lombok `@Builder` for most internal classes, and it works on a + `record` too. Consider refactoring by moving related methods to a separate class when it's a better fit. - **Don't return generic tuples.** Instead of `org.apache.commons.lang3.tuple.Pair` (or a similar tuple type), define a small, purpose-named **Java `record`** inline in the class that declares the method, with the **same visibility as the method** (`public`, package-private, or `private`). - **Prefer record keys over concatenated strings.** For a composite `Map` key, use a small `record` instead of concatenating a `String` (e.g. `a + ":" + b`) — correct `equals`/`hashCode`, type-safe, no delimiter/escaping bugs. -- **Use the narrowest interface type** for fields, parameters, variables, and returns (`Map`, - `SequencedMap`, `SortedMap`, `Collection`, `List`) rather than a concrete type like `TreeMap`. Keep - the concrete type only where its behaviour is required (e.g. a `TreeMap` for key-ordered iteration), - still exposed through the interface. -- **Minimize method and constructor parameters.** Don't pass values already reachable from a context - object the method receives (e.g. read from `PublishContext`). For a constructor with many parameters, - use a **builder** — the project uses Lombok `@Builder` for most internal classes, and it works on a - `record` too. **Don't use `@Builder` on public client-API classes** (hard to maintain, meaningless - default builder name) — hand-write the builder or set an explicit `builderClassName`. -- **Name for intent.** Name a boolean-returning method as a query (`shouldSkipChunk`, not `skipChunk`); - name methods/fields/metrics after the effect (prefer `truncated` / `migrationSegment` over - `overflow` / `special`). -- **Construct via factory methods, not internal implementation types** (e.g. `PositionFactory.create(...)`, - not `new ImmutablePositionImpl(...)`). - +- **Don't use `@Builder` on public client-API classes** (harder to maintain backwards compatibility) — hand-write the builder. +- **Name methods for intent.** A method's name should reveal what it does. Query methods read like + queries (`shouldSkipChunk`, not `skipChunk`); methods that mutate state or perform an action are + named for that action. **Reserve the `get` prefix for pure queries** — using it for a method that + mutates state, or otherwise does more than return a value is strongly discouraged. + ## Dependencies Prefer existing dependencies over new libraries. Pulsar commonly uses Apache Commons / Guava @@ -201,8 +199,8 @@ Focus feedback on correctness, reliability, and maintainability. while holding a lock (this has fixed real deadlocks and contention). - Give threads **meaningful names**. When creating thread pools, prefer Netty's **`io.netty.util.concurrent.DefaultThreadFactory`** — it produces **`FastThreadLocalThread`** - instances (much faster `FastThreadLocal` lookups, which matter on Netty paths like the pooled - `ByteBuf` allocator) and assigns prefixed names. + instances (lower overhead `FastThreadLocal` lookups, which matter on Netty paths like the pooled + `ByteBuf` allocator) and assigns prefixed thread names. Pulsar has no documented, project-wide concurrency model yet; see [`ARCHITECTURE.md` → Concurrency model](ARCHITECTURE.md#concurrency-model-a-known-gap) for the From ca5154ef20ce60232f1d40e4386754e45535713a Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 17:01:09 +0300 Subject: [PATCH 35/39] [improve][misc] Clarify DEBUG vs TRACE log level guidance in CODING.md --- CODING.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CODING.md b/CODING.md index 58cebb4380aa8..230e03b9a0d86 100644 --- a/CODING.md +++ b/CODING.md @@ -27,6 +27,7 @@ for the agent-specific guardrails on top of it. slog's lazy form — `log.debug().attr("dump", () -> expensiveDump()).log("...")` or `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. - Avoid logging on hot paths, and stack traces at `INFO` or lower. +- Use `DEBUG` in a way where it could be enabled in production without causing too many log entries. Use `TRACE` for more detailed information. ## Asynchronous programming From b63b2e647547d485ed9c01de9ddef15cc621df8d Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 17:10:45 +0300 Subject: [PATCH 36/39] [improve][misc] Fix SECURITY.md: private@ list is a CC to security@apache.org, not an alternative --- SECURITY.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/SECURITY.md b/SECURITY.md index 3283aa2ebad7a..864a7cf1e6b83 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -12,8 +12,9 @@ Do **not** open a public GitHub issue, pull request, or discussion for a suspect that defeats coordinated disclosure. Report it privately by email to the Apache Security Team at **security@apache.org** (the ASF's central -security address). You may also, or instead, write to the Apache Pulsar PMC's private list, -**private@pulsar.apache.org**. +security address). When reporting to **security@apache.org**, you can copy your email to the Apache +Pulsar PMC's private list, **private@pulsar.apache.org**, to send your report to the Project Management +Committee as well. **A human must verify and take responsibility for every security report.** Deciding that some behaviour is actually a vulnerability requires judgement against Pulsar's threat model (see From 7029b2658474279d98a1b899436491556cec2bba Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 17:32:56 +0300 Subject: [PATCH 37/39] [improve][misc] Soften SECURITY.md wording on RCE reports and trim security policy link --- SECURITY.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/SECURITY.md b/SECURITY.md index 864a7cf1e6b83..946f9dcaa3991 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -25,8 +25,7 @@ independently verify it and own the report. **Autonomous agents must not file se security issues on their own**, and a tool's confident-sounding output is not, by itself, evidence of a vulnerability. -See and - for more detail. +See for more details. ## Disclosure hygiene for contributors @@ -74,10 +73,9 @@ Pulsar's security model is not formally/explicitly defined. Two long-standing de matter when deciding whether something is actually a vulnerability: **Pulsar Functions and connectors execute fully trusted code.** The function instance runtime exists -to run user-provided code — *remote code execution is its core purpose, not a flaw.* (The Pulsar PMC -has repeatedly received reports claiming that "the function instance runtime allows running -user-provided code and results in an RCE"; this is expected, by-design behaviour, not a security -issue.) The available execution models also let the code modify its environment: +to run user-provided code — *remote code execution is its core purpose, not a flaw.* (There have been +some reports claiming that "the function instance runtime allows running user-provided code and results +in an RCE"; this is expected, by-design behaviour, not a security issue.) The available execution models also let the code modify its environment: - The **thread** and **process** runtimes can read or modify any files and state accessible to the process they run in. From e8c836a118dc4e2ba999fd7e25113fe82757ba3d Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 22:14:35 +0300 Subject: [PATCH 38/39] [improve][misc] Add do/don't code examples for key conventions in CODING.md Address PR review feedback: AI agents follow concrete patterns more reliably than prose. Add short Avoid/Prefer code examples (matching the labeled-block style of the former .github/copilot-instructions.md) for: CompletableFuture must not throw synchronously, flatten nested futures with thenCompose, prefer a record over Pair, and record map keys over concatenated strings. Examples use generic placeholder names so they read as illustrations, not real Pulsar APIs. --- CODING.md | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/CODING.md b/CODING.md index 230e03b9a0d86..edff7e5e02fdb 100644 --- a/CODING.md +++ b/CODING.md @@ -37,10 +37,44 @@ Pulsar relies heavily on `CompletableFuture`; prefer it over `ListenableFuture` the returned future — `return CompletableFuture.failedFuture(e);` — including for argument validation (`if (arg == null) return CompletableFuture.failedFuture(new IllegalArgumentException("arg"));`). Throwing *inside* a stage (`thenApply`, `thenCompose`, `handle`, `whenComplete`, …) is fine. + + Avoid (escapes synchronously; a caller chaining `.exceptionally(...)` never sees it): + + ```java + CompletableFuture process(String arg) { + if (arg == null) { + throw new IllegalArgumentException("arg"); + } + return doProcessAsync(arg); + } + ``` + + Prefer (report the validation failure through the returned future): + + ```java + CompletableFuture process(String arg) { + if (arg == null) { + return CompletableFuture.failedFuture(new IllegalArgumentException("arg")); + } + return doProcessAsync(arg); + } + ``` - **Never block on event-loop / async-execution threads** — no `Thread.sleep`, `Future.get()`, `CompletableFuture.join()`, or blocking IO. An operation that performs IO should return a future. - **Avoid nested futures** (`CompletableFuture>`); flatten with `thenCompose`. Prefer **`OrderedExecutor`** for ordered asynchronous work. + + Avoid (`thenApply` on a future-returning function yields `CompletableFuture>`): + + ```java + return firstAsync(arg).thenApply(v -> secondAsync(v)); + ``` + + Prefer (`thenCompose` flattens it to `CompletableFuture`): + + ```java + return firstAsync(arg).thenCompose(v -> secondAsync(v)); + ``` - **Converting a synchronous-throwing method to a failed future is not mechanical** — some callers rely on the throw happening *before* the async work starts, so evaluate each call site. Use a shared `checkArgumentAsync` helper (in `FutureUtil`) to validate without duplicating try/catch. @@ -94,9 +128,37 @@ but much existing code isn't, so integration-style is the pragmatic default. See - **Don't return generic tuples.** Instead of `org.apache.commons.lang3.tuple.Pair` (or a similar tuple type), define a small, purpose-named **Java `record`** inline in the class that declares the method, with the **same visibility as the method** (`public`, package-private, or `private`). + + Avoid (positional and untyped; call sites read `getLeft()` / `getRight()`): + + ```java + private Pair minMax(Collection values) { ... } + ``` + + Prefer (a purpose-named record with the same visibility as the method): + + ```java + private record MinMax(int min, int max) {} + private MinMax minMax(Collection values) { ... } + ``` - **Prefer record keys over concatenated strings.** For a composite `Map` key, use a small `record` instead of concatenating a `String` (e.g. `a + ":" + b`) — correct `equals`/`hashCode`, type-safe, no delimiter/escaping bugs. + + Avoid (delimiter collisions when a value contains `:`; no type safety): + + ```java + Map map = new HashMap<>(); + map.get(a + ":" + b); + ``` + + Prefer (a small record key with correct `equals`/`hashCode`): + + ```java + record Key(String a, String b) {} + Map map = new HashMap<>(); + map.get(new Key(a, b)); + ``` - **Don't use `@Builder` on public client-API classes** (harder to maintain backwards compatibility) — hand-write the builder. - **Name methods for intent.** A method's name should reveal what it does. Query methods read like queries (`shouldSkipChunk`, not `skipChunk`); methods that mutate state or perform an action are From c66bc523de299991f5b191628503ec6f87893f37 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Wed, 27 May 2026 22:47:32 +0300 Subject: [PATCH 39/39] [improve][misc] Add concurrency-limiting / backpressure guidance to CODING.md Document the available options for bounding concurrent async operations: ConcurrencyReducer, FutureUtil.Sequencer, and AsyncSemaphoreImpl, and when to prefer each. --- CODING.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/CODING.md b/CODING.md index edff7e5e02fdb..6623e342e75b8 100644 --- a/CODING.md +++ b/CODING.md @@ -79,6 +79,15 @@ Pulsar relies heavily on `CompletableFuture`; prefer it over `ListenableFuture` on the throw happening *before* the async work starts, so evaluate each call site. Use a shared `checkArgumentAsync` helper (in `FutureUtil`) to validate without duplicating try/catch. +- **Limit concurrency and handle backpressure.** Firing many async operations at once can overwhelm the + system. Options: + - **`com.spotify.futures.ConcurrencyReducer`** — caps in-flight futures at a configurable limit (used + in the Admin client to bound concurrent requests per broker). + - **`org.apache.pulsar.common.util.FutureUtil.Sequencer`** — runs async operations sequentially. + - **`org.apache.pulsar.common.semaphore.AsyncSemaphoreImpl`** — a non-blocking semaphore with a + per-operation cost that queues callers instead of failing when the limit is reached. Preferred over + `ConcurrencyReducer` for request-driven cases that need a timeout on permit acquisition. + ## Testing conventions Most Pulsar **"unit tests"** (`src/test`, run with `./gradlew ::test`) are actually