diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 69281b4a03bc2..9772d92017363 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,18 +1,26 @@ - + Fixes #xyz diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md deleted file mode 100644 index 54d832eefd0f7..0000000000000 --- a/.github/copilot-instructions.md +++ /dev/null @@ -1,328 +0,0 @@ -The following instructions are only to be applied when performing a code review. - -# Copilot Instructions for Apache Pulsar - -## Project Context - -Apache Pulsar is a distributed messaging and streaming platform designed for high throughput, low latency, and -horizontal scalability. - -The codebase contains performance-critical, asynchronous, and concurrency-sensitive components such as brokers, storage -clients, and networking layers. - -Code reviews should prioritize: - -* correctness -* thread safety -* performance -* maintainability -* backward compatibility - ---- - -# Java Coding Conventions - -Apache Pulsar follows the Sun Java Coding Conventions with additional project-specific rules. - -## Formatting - -* Use **4 spaces** for indentation. -* **Tabs must never be used**. -* Always use **curly braces**, even for single-line `if` statements. - -Example: - -```java -if (condition) { - doSomething(); -} -``` - -## Javadoc - -* Do **not include `@author` tags** in Javadoc. - -## TODO Comments - -All TODO comments must reference a GitHub issue. - -Example: - -```java -// TODO: https://github.com/apache/pulsar/issues/XXXX -``` - ---- - -# Dependencies - -Prefer existing dependencies instead of introducing new libraries. - -The Pulsar codebase commonly uses: - -* **Apache Commons libraries or Guava** for utilities -* **FastUtil** for optimized type specific collections -* **JCTools** for high performance concurrent data structures -* **RoaringBitmap** for compressed bitmaps (bitsets) -* **Caffeine** for caching -* **Jackson** for JSON handling -* **Prometheus Java simpleclient** (or newer **Prometheus Java Metrics Library**) for metrics -* **OpenTelemetry API** for metrics -* **Netty** for networking and buffers - -When introducing a new dependency: - -* justify why existing dependencies are insufficient -* ensure required license files and notices are updated - ---- - -# Logging Guidelines - -* Use **SLF4J** for logging. -* Do **not use** `System.out` or `System.err`. -* Assume production commonly runs at **INFO** log level. -* Avoid excessive logging in hot paths. -* Guard expensive `DEBUG` and `TRACE` logging with `log.isDebugEnabled()` or `log.isTraceEnabled()`. -* Avoid logging stack traces at `INFO` and lower levels. - -Log messages should be: - -* clear and descriptive -* capitalized -* understandable without reading the code - ---- - -# Resource and Memory Management - -Ensure resources are always closed correctly. - -Prefer: - -```java -try (InputStream in = ...) { - // use resource -} -``` - -Avoid leaks for: - -* streams -* network connections -* executors -* buffers - -For internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an external API requires -`ByteBuffer`. - ---- - -# Configuration Guidelines - -When adding configuration options: - -* use clear and descriptive names -* provide sensible default values -* update default configuration files -* document the configuration option - ---- - -# Concurrency Guidelines - -Pulsar is designed as a low-latency asynchronous system. - -Verify: - -* public classes are **thread-safe** -* shared mutable state is protected -* mutations occur on the intended thread -* fine-grained synchronization is preferred -* threads have meaningful names for diagnostics - -If a class is not thread-safe, annotate it with: - -```java -@NotThreadSafe -``` - -Prefer **OrderedExecutor** for ordered asynchronous execution. - ---- - -# Asynchronous Programming Guidelines - -Pulsar relies heavily on `CompletableFuture` and asynchronous pipelines. - -Prefer `CompletableFuture` APIs over `ListenableFuture` for new code. - -## Avoid Blocking in Async Paths - -Do not introduce blocking operations in asynchronous execution paths. - -Examples of blocking operations: - -* `Thread.sleep` -* `Future.get()` -* `CompletableFuture.join()` -* blocking IO operations - -Blocking calls must not run on event loop or async execution threads. - -## Avoid Nested Futures - -Avoid returning nested futures such as: - -```java -CompletableFuture> -``` - -Prefer flattening with `thenCompose`. - -## Asynchronous Exception Handling - -Methods returning `CompletableFuture` must not throw synchronous exceptions directly. - -Incorrect: - -```java -public CompletableFuture operation() { - if (error) { - throw new IllegalStateException("unexpected state"); - } - return CompletableFuture.completedFuture(null); -} -``` - -Use returned futures to propagate failures: - -```java -return CompletableFuture.failedFuture(exception); -``` - -This also applies to argument validation in async methods: - -```java -if (arg == null) { - return CompletableFuture.failedFuture(new IllegalArgumentException("arg")); -} -``` - -Throwing exceptions inside async stages such as `thenApply`, `thenCompose`, `thenRun`, `handle`, or `whenComplete` -is acceptable: - -```java -return future.thenApply(v -> { - if (error) { - throw new IllegalStateException("unexpected state"); - } - return result; -}); -``` - ---- - -# Backward Compatibility - -Apache Pulsar maintains strong compatibility guarantees. - -Changes must not break: - -* public APIs -* client compatibility -* wire protocol compatibility -* metadata or serialized formats - -Servers must be compatible with both older and newer clients. - -Flag any change that may break compatibility. - ---- - -# Testing Guidelines - -## Unit testing - -* TestNG is used as the testing framework -* Mocking uses Mockito -* Assertions should prefer using AssertJ library with descriptions over using TestNG assertions -* Awaitility should be used to handle assertions with asynchronous conditions together with AssertJ - -## Avoid Reflection in Tests - -Do **not** use reflection to access private fields or methods from tests. This includes -`WhiteboxImpl.getInternalState`, `WhiteboxImpl.setInternalState`, `Field.setAccessible(true)`, -`Method.setAccessible(true)`, and similar reflection helpers. - -Reflection-based test access is discouraged because it: - -* breaks during refactoring without any compile-time signal — renaming or retyping a field - silently invalidates the test -* produces verbose, brittle, and hard-to-read test code -* couples tests to implementation details that should be free to change - -Instead, expose what tests legitimately need through **package-private** methods (or fields -where appropriate) and annotate them with `@VisibleForTesting`: - -```java -@VisibleForTesting -Map getSubscriptions() { - return subscriptions; -} -``` - -The test then accesses state through a normal, statically-typed call: - -```java -var subscriptions = persistentTopic.getSubscriptions(); -``` - -instead of: - -```java -ConcurrentOpenHashMap subscriptions = - WhiteboxImpl.getInternalState(persistentTopic, "subscriptions"); -``` - -Place the test in the same package as the class under test so package-private visibility is -sufficient — no need to widen visibility to `public` for testing. - -When reviewing PRs, flag any new use of reflection in tests and suggest a `@VisibleForTesting` -package-private accessor instead. See the dev@ thread -[Stop using reflection to access private fields in tests](https://lists.apache.org/thread/7gr04sqmzyttx4ln6ydtp3qv0xgo1o6m) -for the full rationale. - -# Testing Expectations - -Every feature or bug fix should include tests. - -Verify: - -* unit tests exist -* edge cases are covered -* failure scenarios are tested -* tests are deterministic and stable -* tests avoid `sleep`-based timing assumptions -* tests include timeouts to prevent hangs - -Integration tests may be required for distributed components. - ---- - -# Pull Request Review Guidance - -When reviewing a pull request, Copilot should: - -* verify Java coding conventions are followed -* detect thread safety risks -* flag blocking operations in async paths -* detect improper `CompletableFuture` usage -* detect unnecessary dependencies and missing license updates -* ensure logging follows project guidelines -* verify backward compatibility -* suggest missing tests when appropriate -* flag reflection-based access to private fields or methods in tests (e.g. `WhiteboxImpl`, - `setAccessible(true)`) and recommend a package-private `@VisibleForTesting` accessor instead - -Focus feedback on correctness, reliability, and maintainability. \ No newline at end of file diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 120000 index 0000000000000..be77ac83a1895 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1 @@ +../AGENTS.md \ No newline at end of file diff --git a/.github/instructions/gradle-build.instructions.md b/.github/instructions/gradle-build.instructions.md new file mode 100644 index 0000000000000..a5d5eca5ef7ec --- /dev/null +++ b/.github/instructions/gradle-build.instructions.md @@ -0,0 +1,10 @@ +--- +description: Conventions for changing Apache Pulsar's Gradle build (build scripts, version catalog, convention plugins). +applyTo: "**/*.gradle.kts,**/gradle.properties,gradle/libs.versions.toml,build-logic/**" +--- + +# Gradle build changes + +These files configure Apache Pulsar's **Gradle** build. Follow the guidance in +[`ARCHITECTURE.md` → Changing the build](../../ARCHITECTURE.md#changing-the-build) (and the +surrounding *Build infrastructure* section) when editing or reviewing a build file. \ No newline at end of file diff --git a/.github/instructions/java-coding.instructions.md b/.github/instructions/java-coding.instructions.md new file mode 100644 index 0000000000000..1021307959995 --- /dev/null +++ b/.github/instructions/java-coding.instructions.md @@ -0,0 +1,10 @@ +--- +description: Coding conventions for Apache Pulsar Java source and test code. +applyTo: "**/*.java" +--- + +# Java code changes + +Apache Pulsar is performance-critical, asynchronous, and concurrency-sensitive. The full reference is +[`CODING.md`](../../CODING.md) (with [`CONTRIBUTING.md`](../../CONTRIBUTING.md) for running tests) — +follow it when editing or reviewing Java code. \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000000..e6479cf72297c --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,81 @@ +# Agent guide for Apache Pulsar + +Supplemental guidance for AI coding assistants (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, +and similar tools) working in this repository. Keep this file short — it is a **router**. The detail +lives in the human-facing docs linked below; read the one that fits your task rather than pulling +everything into context up front. + +Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is +performance-critical, heavily asynchronous, and concurrency-sensitive. Prioritize **correctness, +thread safety, performance, maintainability, and backward compatibility**. + +## Licensing and provenance (read first) + +Apache Pulsar is licensed under the Apache License 2.0, and all contributions must meet the ASF's +[Generative Tooling guidance](https://www.apache.org/legal/generative-tooling.html): + +- **A human is in the loop and is accountable.** Every pull request and every security report must be + submitted by a human contributor who has reviewed and verified the change and takes responsibility + for it. Fully autonomous agents (e.g. OpenClaw-style bots) opening PRs or filing reports on their own + are not acceptable — the AI assists, the human is accountable. +- **Don't introduce code of incompatible or unknown provenance.** Never copy verbatim from + GPL/AGPL/LGPL, proprietary, or unlicensed sources (including Stack Overflow / blog / forum snippets of + unclear licensing), and don't use a tool whose terms restrict the output inconsistently with open + source. Reimplement from specifications or Apache-compatible sources, and follow the + [ASF 3rd Party Licensing Policy](https://www.apache.org/legal/resolved.html). +- **Every new source file needs the ASF license header** (Spotless enforces this — copy the form from + an existing `.java` file). +- **Consider attributing AI assistance.** When AI tooling assisted on a change, consider adding an + `Assisted-by: ` commit trailer (the default, since a human still does the final + review); `Generated-by: ` is for minimally-modified generated output. + +## Canonical docs (read the one that fits the task) + +| Doc | Use for | +|-----|---------| +| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Local dev workflow: building (prerequisites, build/lint commands), running tests & test groups, integration tests, Personal CI, PR conventions, security reporting. | +| [`ARCHITECTURE.md`](ARCHITECTURE.md) | Big-picture module map, the Gradle build infrastructure and how to change build files (convention plugins, version catalog, the module-name-vs-directory gotcha), the concurrency model and backpressure, and the `pip/` proposals. | +| [`CODING.md`](CODING.md) | Coding conventions: style, async/`CompletableFuture`, concurrency, logging ([slog](https://github.com/merlimat/slog)), dependencies, backward compatibility, testing, and the review checklist. | +| [`SECURITY.md`](SECURITY.md) | Reporting a vulnerability, disclosure hygiene, and checking exposure to an already-public CVE. | + +The authoritative project documentation is at , whose source lives in the +[`apache/pulsar-site`](https://github.com/apache/pulsar-site) repository (where documentation changes +are contributed). The files above and the website remain the source of truth — this guide just layers +AI-specific pointers on top. + +## Agent guardrails + +A few rules matter specifically when an AI tool makes the change, on top of the canonical docs above: + +- **Verify, don't fabricate.** Before you use a class, method, field, configuration key, Gradle task, + plugin, or DSL name, confirm it actually exists (search the code or the build files) — don't invent + APIs, test assertions, or symbol names. Don't assert a CVE is "fixed" / "not affected" without + checking the merged PRs and the shipped version. +- **Confirm state-changing actions.** Get the user's explicit confirmation before pushing, opening or + updating a PR, or posting a comment. Being asked to start a task is not standing authorization for + these outward-facing actions — see *Licensing and provenance* above. +- **A clean local run is weak evidence for concurrency/data race fixes** (timing- and + platform-dependent). See [`CODING.md`](CODING.md#reproducing-concurrency--memory-visibility-bugs). + +## Critical rules + +1. **Don't break backward compatibility** — public APIs, client compatibility, wire protocol, and + serialized/metadata formats. Servers must interoperate with older and newer clients. +2. **Never block on async/event-loop threads;** methods returning `CompletableFuture` must not throw + synchronously. See [`CODING.md`](CODING.md#asynchronous-programming). +3. **Logging:** prefer [slog](https://github.com/merlimat/slog) via `@CustomLog`; default new logs to + `TRACE`/`DEBUG`, not `INFO`. +4. **Tests:** scope runs with `--tests`; no reflection into private state (use a `@VisibleForTesting` + package-private accessor); release buffers and resources. +5. **PRs:** semantic `[type][scope]` title; describe **motivation** and **modifications**; do not + rebase once the PR is open in `apache/pulsar` — merge upstream `master` instead. +6. **Security:** never disclose a vulnerability — or the security nature of a change — in a public + issue, PR, or commit. See [`SECURITY.md`](SECURITY.md). +7. **Stay in scope.** Keep a change focused on its task; don't bundle unrelated drive-by refactors or + generate broad mass-refactoring PRs. Discuss large refactorings on `dev@pulsar.apache.org` first. + See [`CONTRIBUTING.md`](CONTRIBUTING.md#pull-requests). + +## Where to ask + +- Dev mailing list: · Slack: +- Issues: · Discussions: diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000000000..f4c9ae75a9711 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,150 @@ +# Architecture + +Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is +performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, +networking). + +The authoritative documentation lives at — see the +[Architecture Overview](https://pulsar.apache.org/docs/4.2.x/concepts-architecture-overview/) for the +conceptual model. For a deeper, generated architecture description see +[DeepWiki](https://deepwiki.com/apache/pulsar); coding agents can install the +[DeepWiki MCP](https://docs.devin.ai/work-with-devin/deepwiki-mcp) for richer coverage of Pulsar's +architecture. This file is a map of the repository for contributors (and AI coding agents) who need to +find their way around the modules quickly. + +## Big picture + +Pulsar separates a **stateless serving layer** (brokers) from **durable storage** (Apache +BookKeeper) and a **metadata store** (Oxia / ZooKeeper). The Gradle modules layer +accordingly: + +- **`pulsar-client-api`, `pulsar-client-admin-api`** — public, backward-compatible interfaces only. + `pulsar-client-api-v5` / `pulsar-client-v5` are the newer V5 client API (PIP-466/468). +- **`pulsar-client` (`:pulsar-client-original`)** — the Java client implementation + (producer/consumer/reader, connection pooling). `pulsar-client-admin` implements the admin REST + client. +- **`pulsar-common`** — wire protocol and shared types. Protobuf / lightproto messages are + **generated** into `generated-lightproto/` / `generated-sources/` (excluded from checkstyle and + spotless). +- **`pulsar-metadata`** — pluggable metadata store abstraction (Oxia / ZooKeeper, plus RocksDB and memory) + used by broker and bookkeeper. +- **`managed-ledger`** — the storage abstraction over **Apache BookKeeper**: append-only ledgers + + cursors that track consumer/subscription positions. This is the durability layer the broker reads + and writes through. +- **`pulsar-broker`** — the server. `PulsarService` is the composition root wiring everything + together; `BrokerService` manages topics, subscriptions, and client connections. Entry points: + `PulsarBrokerStarter` (broker), `PulsarStandalone` / `PulsarStandaloneStarter` (all-in-one), + `PulsarClusterMetadataSetup` (cluster init). +- **`pulsar-proxy`** — optional proxy/gateway in front of brokers. +- **`pulsar-functions/*`** — serverless compute (Functions): `proto`, `api-java`, `instance`, + `runtime`, `worker`, `localrun`. +- **`pulsar-io/*`** — connector framework core only; most built-in connectors were moved to the + separate `pulsar-connectors` repo (PIP-465). +- **`pulsar-transaction/*`** — transaction coordinator and common types. +- **`tiered-storage/*`, `offloaders/`** — offload ledger data to cloud/filesystem storage. +- **`pulsar-websocket`** — WebSocket-to-Pulsar bridge. **`pulsar-client-tools`** — the + `pulsar-admin` / `pulsar-client` CLIs. +- **Shaded / distribution** — `pulsar-client-shaded`, `pulsar-client-all`, + `pulsar-client-admin-shaded` produce relocated fat jars; `distribution/*` assembles + server/shell/offloader tarballs. + +## Pulsar Improvement Proposals (`pip/`) + +The **`pip/`** directory holds **Pulsar Improvement Proposals** (`pip-.md`) — the design +documents for significant changes, referenced as `PIP-` throughout commit messages and code (e.g. +PIP-463 = Maven→Gradle migration, PIP-465 = IO connectors moved out, PIP-466/468 = V5 client). +`pip/README.md` describes the process and `pip/TEMPLATE.md` is the proposal template. Consult the +relevant PIP for the rationale behind a non-trivial feature or architectural decision. A PIP **number +is reserved by the first `dev@pulsar.apache.org` thread that uses it** — start the discussion to claim +the next free number. + +## Concurrency model (a known gap) + +Pulsar does **not** have a clearly established, documented concurrency model, which makes it hard to +evaluate whether a given piece of code is correct by construction. (Contrast Netty, which has a clear +rule: all handling on the IO thread is non-blocking, which by extension means avoiding synchronization +and locks on that path.) Pulsar does not strictly follow such a rule; modern JVMs and hardware +optimize `synchronized` code well enough that this has not blocked high performance, but it does make +reasoning about correctness harder than it needs to be. + +Conventions that **should** be documented (and largely are not yet): + +- which work belongs on the network-connection **event loop** vs. other threads; +- how the various **thread pools** are intended to be used, and what kind of work belongs on each; +- how threads are expected to **hand off state** to each other; +- when a `CompletableFuture`'s **completion thread should be switched** to another thread, and which one; +- **concurrency limits** for asynchronous tasks; +- preferring the **single-writer principle** to avoid concurrent state mutation. + +Until such a model is written down, follow the surrounding code's conventions and the Java-Memory-Model +rules in [`CODING.md`](CODING.md#concurrency). Once a model is defined, it becomes far more tractable to +"lift and shift" existing code toward it and enforce the rules consistently rather than having each +contributor rediscover the conventions case by case. + +## Backpressure + +Closely tied to the concurrency model is **backpressure** — how the system avoids accepting more work +than it can handle, particularly with respect to memory. The memory side is described in +[PIP-442 "Existing Broker Memory Management"](pip/pip-442.md#existing-broker-memory-management). Broader +backpressure (beyond memory) is not yet documented and would benefit from being defined alongside the +concurrency model. + +## Build infrastructure + +Apache Pulsar uses a **Gradle** build (migrated from Maven via PIP-463; some older tooling and docs +elsewhere still reference Maven). The wrapper `./gradlew` requires **JDK 21 or 25** (bytecode targets +Java 17). See [`CONTRIBUTING.md` → Building](CONTRIBUTING.md#building) for the build and lint commands. + +- `settings.gradle.kts` — all modules, organized in dependency tiers (Tier 0 has no internal deps, + higher tiers build on lower ones). +- `build-logic/conventions/` — convention plugins (`pulsar.java-conventions`, + `pulsar.code-quality-conventions`, `pulsar.shadow-conventions`, etc.) applied by modules. Shared + compile/test/dependency config lives here — edit it here rather than duplicating across modules. +- `gradle/libs.versions.toml` — version catalog (single source of truth for dependency versions; + referenced as `libs.*` in build scripts). +- `pulsar-dependencies` — enforced platform (BOM) pinning all dependency versions; applied to every + module. + +The build enables both the **configuration cache** (`org.gradle.configuration-cache=true`) and +**configure-on-demand** (`org.gradle.configureondemand=true`). + +### Module name vs. directory name gotcha + +Several Gradle project paths do **not** match their directory because the Maven artifactId is +preserved. Most importantly: + +- Directory `pulsar-client/` → project **`:pulsar-client-original`** +- Directory `pulsar-client-admin/` → project **`:pulsar-client-admin-original`** +- Directory `pulsar-functions/localrun/` → project `:pulsar-functions:pulsar-functions-local-runner-original` + +Always use the Gradle project path (left of any `--tests`), e.g. `./gradlew :pulsar-client-original:test`. +Check `settings.gradle.kts` when a path is ambiguous. + +### Changing the build + +When editing `build-logic/`, `settings.gradle.kts`, a module `build.gradle.kts`, `gradle.properties`, +`gradle/libs.versions.toml`, or the `pulsar-dependencies` platform: + +- **Edit shared config in `build-logic/conventions/`**, not per-module. +- **Versions come from `gradle/libs.versions.toml`** (`libs.*` / `pulsar-dependencies`) — never + hardcode a version in a build script. +- **Keep tasks configuration-cache and configure-on-demand compatible** (both are enabled): no reading + of mutable state at execution time and no `Project` access in task actions — use `Provider` / value + sources, and verify with `--configuration-cache`. Tasks reached by the common flows (`assemble`, + `test`, `integrationTest`, `rat` / `spotlessCheck` / `checkstyle*`, `checkBinaryLicense`, `docker*`) + must be compatible; one-off tooling tasks not part of those flows (e.g. `verifyTestGroups`, ad-hoc + report tasks) may be exempt. +- **Published modules must not depend on internal modules** at compile/runtime scope — the artifact + would be unresolvable from Maven Central. A module is published only when it applies + `pulsar.public-java-library-conventions`. +- **After a dependency change**, run `./gradlew checkBinaryLicense` and update the distribution + `LICENSE`/`NOTICE`; justify any genuinely new dependency (see + [`CODING.md` → Dependencies](CODING.md#dependencies)). +- **Follow the [Gradle best practices](https://docs.gradle.org/current/userguide/best_practices_index.html)** + — AI agents should read the + [AsciiDoc source](https://github.com/gradle/gradle/blob/master/platforms/documentation/docs/src/docs/userguide/best-practices/best_practices_index.adoc), + which is plain text and cheaper to parse than the rendered HTML. + +Before finishing a build change, confirm the affected task and `./gradlew help` run clean with +`--configuration-cache`, and that `assemble` and `rat spotlessCheck checkstyleMain checkstyleTest` pass +(plus `checkBinaryLicense` if a dependency changed). diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000000000..47dc3e3d863cf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/CODING.md b/CODING.md new file mode 100644 index 0000000000000..6623e342e75b8 --- /dev/null +++ b/CODING.md @@ -0,0 +1,319 @@ +# Coding guidelines + +Apache Pulsar follows the Sun Java Coding Conventions with additional project-specific rules. The +codebase is performance-critical, asynchronous, and concurrency-sensitive, so code review prioritizes +**correctness, thread safety, performance, maintainability, and backward compatibility**. This file is +the canonical coding reference for human contributors and AI coding agents; see [`AGENTS.md`](AGENTS.md) +for the agent-specific guardrails on top of it. + +## Style + +- **4 spaces** for indentation; **tabs must never be used**. +- Always use **curly braces**, even for single-line `if` statements. +- No `@author` tags in Javadoc. +- Every `TODO` must reference a GitHub issue, e.g. `// TODO: https://github.com/apache/pulsar/issues/XXXX`. +- Checkstyle config: `buildtools/src/main/resources/pulsar/checkstyle.xml`. Lombok is enabled. + +## Logging + +- Prefer **[slog](https://github.com/merlimat/slog)** (`io.github.merlimat.slog`) via Lombok's + **`@CustomLog`** (wired in `lombok.config` to `Logger.get(TYPE)`). **SLF4J is deprecated** for new + code; never use `System.out` / `System.err`. +- **Default new logs to `TRACE`/`DEBUG`, not `INFO`** — Pulsar overuses `INFO` and floods production + logs. Reserve `INFO` for low-frequency lifecycle/state-change events. +- Attach data as **structured attributes** — `log.info().attr("topic", topic).log("Published")` — not + interpolated into the message string. +- For expensive `DEBUG`/`TRACE` values, don't guard with `isDebugEnabled()`/`isTraceEnabled()`; use + slog's lazy form — `log.debug().attr("dump", () -> expensiveDump()).log("...")` or + `log.debug(e -> e.attr("dump", expensiveDump()).log("..."))`. +- Avoid logging on hot paths, and stack traces at `INFO` or lower. +- Use `DEBUG` in a way where it could be enabled in production without causing too many log entries. Use `TRACE` for more detailed information. + +## Asynchronous programming + +Pulsar relies heavily on `CompletableFuture`; prefer it over `ListenableFuture` for new code. + +- **A method returning `CompletableFuture` must not throw synchronously.** Propagate failures through + the returned future — `return CompletableFuture.failedFuture(e);` — including for argument validation + (`if (arg == null) return CompletableFuture.failedFuture(new IllegalArgumentException("arg"));`). + Throwing *inside* a stage (`thenApply`, `thenCompose`, `handle`, `whenComplete`, …) is fine. + + Avoid (escapes synchronously; a caller chaining `.exceptionally(...)` never sees it): + + ```java + CompletableFuture process(String arg) { + if (arg == null) { + throw new IllegalArgumentException("arg"); + } + return doProcessAsync(arg); + } + ``` + + Prefer (report the validation failure through the returned future): + + ```java + CompletableFuture process(String arg) { + if (arg == null) { + return CompletableFuture.failedFuture(new IllegalArgumentException("arg")); + } + return doProcessAsync(arg); + } + ``` +- **Never block on event-loop / async-execution threads** — no `Thread.sleep`, `Future.get()`, + `CompletableFuture.join()`, or blocking IO. An operation that performs IO should return a future. +- **Avoid nested futures** (`CompletableFuture>`); flatten with `thenCompose`. + Prefer **`OrderedExecutor`** for ordered asynchronous work. + + Avoid (`thenApply` on a future-returning function yields `CompletableFuture>`): + + ```java + return firstAsync(arg).thenApply(v -> secondAsync(v)); + ``` + + Prefer (`thenCompose` flattens it to `CompletableFuture`): + + ```java + return firstAsync(arg).thenCompose(v -> secondAsync(v)); + ``` +- **Converting a synchronous-throwing method to a failed future is not mechanical** — some callers rely + on the throw happening *before* the async work starts, so evaluate each call site. Use a shared + `checkArgumentAsync` helper (in `FutureUtil`) to validate without duplicating try/catch. + +- **Limit concurrency and handle backpressure.** Firing many async operations at once can overwhelm the + system. Options: + - **`com.spotify.futures.ConcurrencyReducer`** — caps in-flight futures at a configurable limit (used + in the Admin client to bound concurrent requests per broker). + - **`org.apache.pulsar.common.util.FutureUtil.Sequencer`** — runs async operations sequentially. + - **`org.apache.pulsar.common.semaphore.AsyncSemaphoreImpl`** — a non-blocking semaphore with a + per-operation cost that queues callers instead of failing when the limit is reached. Preferred over + `ConcurrencyReducer` for request-driven cases that need a timeout on permit acquisition. + +## Testing conventions + +Most Pulsar **"unit tests"** (`src/test`, run with `./gradlew ::test`) are actually +**integration-style** — they start a real in-JVM broker (`MockedPulsarServiceBaseTest` / +`pulsarTestContext`) rather than testing a class in isolation. The **container integration tests** +under `tests/` run against a Pulsar Docker image (see +[`CONTRIBUTING.md`](CONTRIBUTING.md#integration-tests)). Ideally code is factored so genuine units *can* +be unit-tested in isolation with light mocking — excessive mocking is a design smell, not the goal — +but much existing code isn't, so integration-style is the pragmatic default. See +[`CONTRIBUTING.md`](CONTRIBUTING.md) for how to *run* tests (groups, `--tests` scoping, retry count). + +- **TestNG + Mockito.** Prefer **AssertJ** assertions (with descriptions) over TestNG asserts; use + **Awaitility** for async conditions instead of `sleep` timing, with timeouts to prevent hangs. + `untilAsserted(...)` retries assertions, `until(...)` waits for a boolean — don't swap them. Verify + async interactions with Mockito `timeout(...)`, not fixed sleeps. +- Every feature or bug fix needs **deterministic** tests for edge and failure cases. A bug-fix test + must **fail on the unpatched code for the real reason** — not because it forces internal state. +- For code not factored for isolation, prefer an integration-style test over mocking a web of + collaborators: inject faults via the test infrastructure (e.g. + `pulsarTestContext.getMockBookKeeper().setReadHandleInterceptor(...)`) and assert on logs with + `TestLogAppender`. It's fine to add a **clean new test class** rather than extend an awkward one. +- **No reflection into private state** (`WhiteboxImpl.getInternalState`/`setInternalState`, + `setAccessible(true)`). Expose a **package-private `@VisibleForTesting`** accessor and put the test in + the same package; flag new reflection in review ([dev@ rationale](https://lists.apache.org/thread/7gr04sqmzyttx4ln6ydtp3qv0xgo1o6m)). +- **New integration-style tests: extend `SharedPulsarBaseTest`.** It shares one `SharedPulsarCluster` + for the test-JVM lifecycle (`admin` / `pulsarClient` are per test class); each method gets its own + namespace. Use `getNamespace()` and `newTopicName()` — never hardcode namespace/topic names, since + the runtime is shared. +- **Close/release what the test allocates.** A **`ByteBuf`/buffer leak** (pooled-allocator detection, + `-Dpulsar.allocator.pooled=true`) is a **real bug** — fix the missing `release()`. A **thread leak + from `ThreadLeakDetectorListener` is unreliable** (high false-positive rate, notably with + `SharedPulsarBaseTest` and when `THREAD_LEAK_DETECTOR_WAIT_MILLIS` is too low — ≈`10000` recommended, + only effective with the Gradle daemon disabled, `--no-daemon`); corroborate before treating it as + real. +- **Validate performance optimizations with a JMH benchmark** under `microbench/`, simulating a + realistic production usage pattern (see `microbench/README.md`). + +## General recommendations + +- **Use the narrowest interface type** for fields, parameters, variables, and returns (`Map`, + `SequencedMap`, `SortedMap`, `Collection`, `List`) rather than a concrete type like `TreeMap`. Keep + the concrete type only where its behaviour is required (e.g. a `TreeMap` for key-ordered iteration), + still exposed through the interface. +- **Minimize method and constructor parameters.** For a constructor with many parameters, + use a **builder** — the project uses Lombok `@Builder` for most internal classes, and it works on a + `record` too. Consider refactoring by moving related methods to a separate class when it's a better fit. +- **Don't return generic tuples.** Instead of `org.apache.commons.lang3.tuple.Pair` (or a similar + tuple type), define a small, purpose-named **Java `record`** inline in the class that declares the + method, with the **same visibility as the method** (`public`, package-private, or `private`). + + Avoid (positional and untyped; call sites read `getLeft()` / `getRight()`): + + ```java + private Pair minMax(Collection values) { ... } + ``` + + Prefer (a purpose-named record with the same visibility as the method): + + ```java + private record MinMax(int min, int max) {} + private MinMax minMax(Collection values) { ... } + ``` +- **Prefer record keys over concatenated strings.** For a composite `Map` key, use a small `record` + instead of concatenating a `String` (e.g. `a + ":" + b`) — correct `equals`/`hashCode`, type-safe, + no delimiter/escaping bugs. + + Avoid (delimiter collisions when a value contains `:`; no type safety): + + ```java + Map map = new HashMap<>(); + map.get(a + ":" + b); + ``` + + Prefer (a small record key with correct `equals`/`hashCode`): + + ```java + record Key(String a, String b) {} + Map map = new HashMap<>(); + map.get(new Key(a, b)); + ``` +- **Don't use `@Builder` on public client-API classes** (harder to maintain backwards compatibility) — hand-write the builder. +- **Name methods for intent.** A method's name should reveal what it does. Query methods read like + queries (`shouldSkipChunk`, not `skipChunk`); methods that mutate state or perform an action are + named for that action. **Reserve the `get` prefix for pure queries** — using it for a method that + mutates state, or otherwise does more than return a value is strongly discouraged. + +## Dependencies + +Prefer existing dependencies over new libraries. Pulsar commonly uses Apache Commons / Guava +(utilities), **FastUtil** (type-specific collections), **JCTools** (concurrent structures), +**RoaringBitmap** (compressed bitsets), **Caffeine** (caching), **Jackson** (JSON), Prometheus / +**OpenTelemetry** (metrics), and **Netty** (networking and buffers). + +A new dependency must be justified (why existing ones are insufficient) and must update the +bundled-dependency `LICENSE`/`NOTICE` — verify with `./gradlew checkBinaryLicense`. + +## Backward compatibility + +Pulsar maintains strong compatibility guarantees. Changes must not break public APIs, client +compatibility, wire-protocol compatibility, or serialized/metadata formats — servers must work with +both older and newer clients. Flag any change that may break compatibility. + +**Plugin / SPI extension points are public API.** Many interfaces are selected by a `*ClassName` +configuration setting — e.g. `LoadManager`, `LedgerOffloaderFactory`, `AuthorizationProvider` / +`AuthenticationProvider`, `EntryFilter`, `TopicFactory`, `BrokerInterceptor`, dispatcher / +delayed-delivery-tracker factories, `CustomCommand` — and third parties ship implementations. Changing +such an interface, or a `protected` member of an extensible class (`PulsarWebResource`, +`PersistentTopic`, `Producer`), breaks them: it generally needs a PIP and must not land in +maintenance-branch backports. + +**Design interface changes for backward compatibility.** When you add a method to such an interface, +prefer a `default` implementation that delegates to an existing method, so older third-party +implementations keep working unchanged. If no sensible delegation exists, add a separate +capability-query method (e.g. `boolean supportsX()`) the broker checks at runtime, so it can support +older implementations gracefully instead of depending on the new method. + +**Don't leak third-party types through public/plugin interfaces.** Exposing Netty or AsyncHttpClient +classes breaks consumers of the **shaded** client (shaded vs. unshaded classes differ) and couples +callers to the implementation — provide a Pulsar-owned abstraction. Changing a documented behaviour or +guarantee (e.g. PIP-68 exclusive-producer guarantees, default rate-limiter behaviour) needs a PIP and a +dev@ discussion, not just a code change. + +**Introduce changes behind a backward-compatible default.** Make new/changed behaviour opt-in via +configuration rather than silently changing existing deployments. Behaviour that risks data loss (e.g. +skipping unrecoverable data) must be gated behind an explicit flag (such as `autoSkipNonRecoverableData`), +defaulting to the safe/old behaviour. + +## Resource and memory management + +- Always close resources (streams, connections, executors, buffers) — prefer try-with-resources. +- On internal networking/messaging paths, prefer **Netty `ByteBuf`** over `ByteBuffer` unless an + external API requires it; release ref-counted buffers you allocate. +- **Don't hand-optimize allocation away.** Pulsar runs on **ZGC** (very low collection overhead), so + the extra short-lived allocations from favouring immutable objects (see *Concurrency* below) are + cheap. Older code pools objects with Netty's `Recycler`; this is **no longer recommended for new + code** — under ZGC the `Recycler` often *costs* more CPU than it saves. Don't add new `Recycler` + usage. See [PIP-443](pip/pip-443.md). + +## Performance + +- **Back optimizations with evidence** — a JMH benchmark (see *Testing conventions*) or a profile, not + intuition — measured on JIT-warmed code (see *Reproducing concurrency / memory-visibility bugs*). +- **On hot paths** (dispatch, IO, per-message): avoid `String.format` (build strings directly), + `Enum.values()` (match explicitly), and unnecessary allocation/locking; prefer lock-free or + single-writer designs. +- **Don't add overhead to an already-overloaded system.** Avoid doing work then discarding it (e.g. + reading entries only to drop them before dispatch) — extra work under load causes cascading failures; + acquire/estimate up front and reconcile afterwards. +- **Bound in-memory caches** (size or byte limit + eviction) and de-duplicate repeated `String`s + (cluster/tenant/namespace ids) with `org.apache.pulsar.common.util.StringInterner`. + +## Configuration + +When adding configuration options: use clear, descriptive names; provide sensible defaults; update the +default configuration files; and document the option. + +## Code review checklist + +When reviewing a PR, verify: + +- Java coding conventions followed; logging follows the guidelines above (slog, levels, structured + attributes). +- Thread-safety risks; no blocking in async paths; correct `CompletableFuture` usage. +- No unnecessary dependencies; LICENSE/NOTICE updated when dependencies change. +- Backward compatibility preserved. +- Tests exist and are appropriate; reflection into private state is flagged with a `@VisibleForTesting` + accessor suggested instead. +- The **PR description explains the change** — at minimum **Motivation (why?)** and **Modifications + (what/how?)**, matching `.github/PULL_REQUEST_TEMPLATE.md`; a title alone isn't sufficient. + +Focus feedback on correctness, reliability, and maintainability. + +## Concurrency + +- Public classes should be **thread-safe**; annotate non-thread-safe ones with `@NotThreadSafe`. +- Protect shared mutable state; prefer fine-grained synchronization; mutate on the intended thread. + Prefer the **single-writer principle** (a given piece of state mutated by only one thread) to avoid + concurrent mutation entirely. +- **Minimize work while holding a lock.** Capture needed state into locals inside the synchronized + block, then run callbacks, listeners, and IO *outside* it — never call out to listener/callback code + while holding a lock (this has fixed real deadlocks and contention). +- Give threads **meaningful names**. When creating thread pools, prefer Netty's + **`io.netty.util.concurrent.DefaultThreadFactory`** — it produces **`FastThreadLocalThread`** + instances (lower overhead `FastThreadLocal` lookups, which matter on Netty paths like the pooled + `ByteBuf` allocator) and assigns prefixed thread names. + +Pulsar has no documented, project-wide concurrency model yet; see +[`ARCHITECTURE.md` → Concurrency model](ARCHITECTURE.md#concurrency-model-a-known-gap) for the +conventions that *should* govern threads, thread pools, and event loops. + +### The Java Memory Model is what makes concurrent code correct + +Several hard-to-investigate Pulsar bugs have come from misconceptions about Java synchronization: + +- **A `synchronized` method or block is not, on its own, thread-safe.** It provides its + visibility/ordering guarantees only when the **same monitor/lock guards both the reads and the + writes** of the shared state. +- On 64-bit JVMs a field's value is **never corrupted** — a read returns some value that was actually + written. What breaks is **visibility**: without a happens-before relationship, threads can observe + different values, or never see an update. Establish happens-before with `synchronized`, `volatile`, + `final`, or `java.util.concurrent` constructs. +- **A field accessed by more than one thread needs explicit visibility** — make it `volatile` (or + guard every read *and* write with the same lock). `volatile` gives single-field visibility but does + **not** make compound updates (read-modify-write, check-then-act) atomic — use `java.util.concurrent` + atomics/locks for those. +- Visibility is per-field, so a mutable object can be observed **partially updated**. +- The only way to be reliably correct is to **conform to the Java Memory Model**. **Benign data races** + are sometimes acceptable, and some Pulsar code relies on this by design — but only as a deliberate, + documented choice. +- **Prefer immutable objects.** An object is **immutable** when all fields are `final` *and* every + nested instance is itself immutable (a `record` is the common case; immutability must hold for the + whole reachable graph). It is **effectively immutable** when never modified after construction but + with non-`final` fields. Publication differs: an **immutable** object benefits from the JMM's + final-field **safe initialization** (visible even when published via a data race) and needs **no** + safe publication; an **effectively immutable** one must be shared via **safe publication** (a `final` + or `volatile` field, or a `java.util.concurrent` construct such as `ConcurrentHashMap`). See + [Safe initialization](https://shipilev.net/blog/2014/safe-public-construction/#_safe_initialization). + +### Reproducing concurrency / memory-visibility bugs + +These bugs are timing- and platform-dependent and easily masked, so a clean run is weak evidence a fix +is correct: + +- Interpreted and JIT-compiled code behave differently. Reproductions often need several **warm-up + rounds with a short pause** so the (tiered, asynchronous) JIT kicks in; a short test may never + trigger compilation. JVM flags can force earlier compilation, and the exercised paths affect what + gets compiled. +- Some races surface only on specific **hardware/OS** — classically **multi-socket / multi-NUMA** + machines, whose weaker cross-socket memory ordering exposes races a single socket never shows. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index db83924c47c8a..db70206a1f71b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -19,6 +19,216 @@ --> -## Contributing to Apache Pulsar +# Contributing to Apache Pulsar -We would love for you to contribute to Apache Pulsar and make it even better! Please check the [Apache Pulsar Contributing Guide](https://pulsar.apache.org/contribute/) before starting to work on the project. +We would love for you to contribute to Apache Pulsar and make it even better! The **authoritative** +contributor guide is the [Apache Pulsar Contributing Guide](https://pulsar.apache.org/contribute/) — +please read it before starting. This file is a quick, in-repo reference for the local development +workflow (build, test, PR, CI). For the big-picture module map and the Gradle build infrastructure see +[`ARCHITECTURE.md`](ARCHITECTURE.md); for code style see [`CODING.md`](CODING.md). + +## Building + +**JDK 21 or 25** is required to build `master` (bytecode targets Java 17; `-PskipJavaVersionCheck` +bypasses the check); `zip` is also needed. Use the bundled wrapper `./gradlew` (Linux/macOS) or +`gradlew.bat` (Windows) — no separate Gradle install. See the +[build-tooling setup guide](https://pulsar.apache.org/contribute/setup-buildtools/) and the +[IDE setup guide](https://pulsar.apache.org/contribute/setup-ide/). + +```bash +# Compile and assemble everything (or a single module) +./gradlew assemble +./gradlew :pulsar-broker:assemble + +# Lint / verify (license headers, formatting, checkstyle) — run before pushing +./gradlew rat spotlessCheck checkstyleMain checkstyleTest +./gradlew spotlessApply # auto-fix license headers/formatting + +# Verify bundled-dependency LICENSE/NOTICE coverage (after changing a runtime dependency) +./gradlew checkBinaryLicense + +# Start a standalone Pulsar service (broker + bookie + metadata in one JVM) +bin/pulsar standalone + +# Build docker images apachepulsar/pulsar(-all):latest +./gradlew docker # or docker-all +``` + +For the Gradle build infrastructure and how to change build files (convention plugins, version +catalog, configuration-cache rules), see +[`ARCHITECTURE.md` → Build infrastructure](ARCHITECTURE.md#build-infrastructure). + +## Running tests + +Most of these per-module "unit tests" are actually **integration-style** — they start a real in-JVM +broker (`MockedPulsarServiceBaseTest` / `pulsarTestContext`) rather than testing a class in isolation. +The **container-based integration tests** that run against a Pulsar Docker image are separate; see +[Integration tests](#integration-tests) below. + +```bash +# Always scope test runs with --tests — running a whole module's test task is slow. +# Run a single test class +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" +# Run a single test method +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest." +# Run all tests in a specific package +./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.admin.*" +``` + +> Note the [module-name-vs-directory gotcha](ARCHITECTURE.md#module-name-vs-directory-name-gotcha): +> directory `pulsar-client/` is the Gradle project `:pulsar-client-original`. + +### Test groups (TestNG) + +Tests use **TestNG** and are tagged with `@Test(groups = "...")`. By default the build **excludes the +`quarantine` and `flaky` groups** (`excludedTestGroups` default = `quarantine,flaky`), so to run a +single test that lives in one of those groups you must clear the exclusion: + +```bash +# Run a specific test that is in the flaky/quarantine group (otherwise excluded by default) +./gradlew :pulsar-broker:test -PexcludedTestGroups='' --tests "" +``` + +CI selects whole groups with `-PtestGroups=` and `-PexcludedTestGroups=` (e.g. +`broker,broker-admin`); locally prefer `--tests` to scope to specific classes instead of running an +entire group. CI splits `pulsar-broker` tests into groups (see +`pulsar-build/run_unit_group_gradle.sh` and `gradle/verify-test-groups.gradle.kts`). Tests with no +group are treated as `other` at runtime. `./gradlew verifyTestGroups` reports group assignments and +flags tests not covered by any CI group. + +Other test-related properties: `-PtestJavaVersion=17` (run tests on a different JDK toolchain), +`-PtestRetryCount=N`, `-PtestFailFast=true|false`, `-PprotobufVersion=4.31.1` (protobuf v4 +compatibility tests). + +Failed tests are retried once by default (`testRetryCount=1`; `0` when running inside the IDE). When +running tests locally, prefer **`-PtestRetryCount=0`** to catch failures (including flakiness) early +instead of having retries mask them. + +### Integration tests + +Integration tests live in `tests/` (see `tests/README.md`). They use +[Testcontainers](https://www.testcontainers.org/) to bring up Pulsar services in Docker, so **Docker +must be installed and running**. Build the test image first, then run the tests. + +The full integration suite is heavy and slow. **In local development, always run individual +integration tests** rather than the whole suite — pass `--tests` to select a class (TestNG then +discovers it directly from the classpath): + +```bash +./gradlew :tests:latest-version-image:dockerBuild # build the docker test image +./gradlew :tests:integration:integrationTest --tests "org.apache.pulsar.tests.integration." +``` + +To run the **entire** integration test set, use **Personal CI** (below) rather than running it +locally. (`integrationTest` also accepts `-PtestGroups` / `-PexcludedTestGroups` and +`-PintegrationTestSuiteFile=.xml` to pick a specific TestNG suite.) + +### Running the full CI pipeline (Personal CI) + +The full test suite is large and slow to run locally. While iterating on a change, run only the +narrowly-scoped tests relevant to the change (a single test class or package, see above) rather than +a module's entire test task. To validate a larger change against the **full** CI pipeline, do not run +everything locally — use **Personal CI**, which runs Pulsar's CI workflows in the contributor's own +GitHub fork. + +If Personal CI is not yet set up, follow the +[Personal CI documentation](https://pulsar.apache.org/contribute/personal-ci/) to enable it on your +fork. Once it is set up, the loop is: + +1. Keep the local `master` up-to-date with `apache/pulsar` and rebase the feature branch on it. +2. Push the feature branch to the **fork** to trigger CI runs there. CI runs against the PR opened in + your own fork (it is normal to have a PR open in the fork *and* a PR for the same branch open in + `apache/pulsar` at the same time). +3. Monitor CI status on the fork and fix failures. +4. Open the PR to `apache/pulsar` only after the fork's CI is green. + +Once the PR to `apache/pulsar` has been opened, stop rebasing as part of this loop: step 1's rebase +no longer applies — bring in upstream changes by merging instead (see [Pull requests](#pull-requests)). + +### Retrying CI after flaky-test failures + +Pulsar has a large number of flaky tests, so GitHub Actions jobs on a PR sometimes fail for reasons +unrelated to the change. When a failure appears to be flakiness rather than a regression caused by the +PR, comment **`/pulsarbot rerun`** on the PR to re-run the failed jobs of the workflow run. It only +takes effect once the workflow run has **completed** (all jobs finished), so wait for the run to +finish before commenting. + +For a PR from a fork, a project maintainer must **approve** the workflow runs before they execute, and +approval is required again whenever new changes are pushed. This adds latency to each run and makes it +slow to tell flaky failures apart from genuine ones. Setting up **Personal CI** (above) sidesteps this +— the full pipeline runs in your own fork without maintainer approval — so it is especially useful when +a PR has legitimate test failures that you need to iterate on. + +If a flaky test that is unrelated to your change blocks your PR, you can move it to the `flaky` group +or disable it within your PR to unblock merging — and report it (search the +[existing flaky-test issues](https://github.com/apache/pulsar/issues?q=is%3Aissue+state%3Aopen+flaky) +first). Don't push empty "trigger CI" commits to force a rerun; use `/pulsarbot rerun` instead. + +## Pull requests + +PRs must follow `.github/PULL_REQUEST_TEMPLATE.md`. PR titles follow the +`[][] ` convention (e.g. `[fix][broker] ...`, +`[improve][build] ...`) — refer to `.github/workflows/ci-semantic-pull-request.yml` for the valid +`[type]` and `[scope]` prefixes, which are enforced by CI. The `` should be in +imperative form, like a good commit message's subject line, and **should not reference issue numbers** +(put `Fixes #N` / `Closes #N` in the description instead). + +**Keep each PR focused on one change.** Don't bundle unrelated drive-by refactoring (for example, +de-duplicating code you happened to touch) into a feature or bug-fix PR — it widens the review surface +and the risk; note such improvements as follow-ups instead. Likewise, **don't reformat unrelated files +or lines that aren't part of your change** (whitespace, import reordering, re-wrapping) — drive-by +formatting hides the real change in review and pollutes `git blame`. **Discuss large refactorings on +dev@pulsar.apache.org before investing effort**: there are many similar code duplications across +Pulsar, and a PR for each creates more maintainer burden than value. AI agents make it easy to generate +large volumes of refactoring PRs, and the project pushes back on these. Every change needs a real, +identifiable contributor who takes responsibility for it; unattributed AI-agent-style contributions — +especially larger ones from anonymous profiles or from people not actually using Pulsar — are typically +rejected. + +**Describe the change.** The PR description must cover, at minimum, the **Motivation (why?)** and the +**Modifications (what / how?)** — these map to the corresponding sections of the PR template. A title +alone, or a description that only restates the title, is not sufficient. Link the related issue with +`Fixes #N` (or equivalently `Closes #N`) for an issue the PR resolves, `Main Issue: #N` for one task +of a larger issue, or `PIP: #N` for a proposal. + +**Never rebase a PR branch once the PR is opened in `apache/pulsar`.** Rebasing rewrites history and +disrupts reviewers (it invalidates review comments and makes incremental diffs unreadable). To bring +in upstream changes, instead fetch from the `apache/pulsar` remote and **merge** its `master` into the +PR branch: + +```bash +git fetch # e.g. `upstream` or `apache` +git merge /master +``` + +(Rebasing onto an updated `master` is fine *before* the PR is opened — see the Personal CI loop above +— but not after.) + +### Branches and backports + +Target `master` first. Once the change is merged, **project maintainers handle backporting** a bug or +security fix to the supported maintenance branches (`branch-4.2`, `branch-4.0`, …) when +the bug is also present there, per the +[release/support policy](https://pulsar.apache.org/contribute/release-policy/). New **features** are +**not** added to LTS / maintenance branches without a dev@pulsar.apache.org discussion (and usually a +PIP), to avoid regressions in stable releases. + +Backporting is done by **cherry-picking commits in their original merge order**, which avoids +unnecessary merge conflicts; sometimes a dependent change must be cherry-picked before the fix itself. +AI tools are effective at resolving the merge conflicts that arise during a backport. + +## Reporting security vulnerabilities + +See [`SECURITY.md`](SECURITY.md) (latest at ) and +. In short: report a +suspected vulnerability **privately** (never in a public issue, PR, or commit), and never reveal the +security nature of a change in public until it is announced. An **already-public** CVE that you only +want to check Pulsar's exposure to is *not* a private disclosure — search the CVE id in apache/pulsar +PRs/issues first, then ask via a GitHub issue or dev@pulsar.apache.org. + +## AI coding agents + +If you use an AI coding assistant (Claude Code, Copilot, Cursor, Gemini, Codex, Aider, …), see +[`AGENTS.md`](AGENTS.md) for the agent-facing guidance — a routing index into this guide, +[`ARCHITECTURE.md`](ARCHITECTURE.md), [`CODING.md`](CODING.md), and [`SECURITY.md`](SECURITY.md), plus +the guardrails that apply specifically to AI-made changes. diff --git a/README.md b/README.md index b729bf8135a46..dae2d2fe26327 100644 --- a/README.md +++ b/README.md @@ -95,7 +95,6 @@ components in the Pulsar ecosystem, including connectors, adapters, and other la - [Pulsar Connectors](https://github.com/apache/pulsar-connectors) (bundled as [pulsar-io](pulsar-io)) - [Pulsar Translation](https://github.com/apache/pulsar-translation) -- [Pulsar SQL (Pulsar Presto Connector)](https://github.com/apache/pulsar-presto) (bundled as [pulsar-sql](pulsar-sql)) - [Ruby Client](https://github.com/apache/pulsar-client-ruby) ## Pulsar Runtime Java Version Recommendation @@ -163,6 +162,10 @@ Docker image Java runtime: 17 ## Build Pulsar +The quick start below covers the essentials; see [`CONTRIBUTING.md` → Building](CONTRIBUTING.md#building) +for the full build and lint commands, and [`ARCHITECTURE.md` → Build infrastructure](ARCHITECTURE.md#build-infrastructure) +for the Gradle build infrastructure and how to change build files. + ### Requirements - JDK @@ -188,53 +191,16 @@ There is also a guide for [setting up the tooling for building Pulsar](https://p ### Build -Compile and assemble: - -```bash -./gradlew assemble -``` - -Check source code license headers and formatting: - -```bash -./gradlew rat spotlessCheck checkstyleMain checkstyleTest -``` - -Check that bundled dependencies are properly recorded in the binary distribution `LICENSE` and `NOTICE` files. Run this after adding, removing, or upgrading a runtime dependency to confirm the corresponding entry has been added to (or removed from) the LICENSE file. The task builds the binary distribution tarballs as needed: - -```bash -./gradlew checkBinaryLicense -``` - -Compile and assemble individual module: - -```bash -./gradlew :pulsar-broker:assemble -``` - -Run Unit Tests: - -```bash -./gradlew test -``` - -Run Individual Unit Test: - ```bash -./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" +./gradlew assemble # compile and assemble +./gradlew :pulsar-client-original:test --tests "ConsumerBuilderImplTest" # run a single test +bin/pulsar standalone # run a standalone service ``` -Run Selected Test packages: - -```bash -./gradlew :pulsar-broker:test --tests "org.apache.pulsar.*" -``` - -Start standalone Pulsar service: - -```bash -$ bin/pulsar standalone -``` +For the full build, lint, test, and PR workflow — test groups, integration tests, Personal CI, and PR +conventions — see [`CONTRIBUTING.md`](CONTRIBUTING.md). For the module map and the Gradle build +infrastructure see [`ARCHITECTURE.md`](ARCHITECTURE.md), and for coding conventions see +[`CODING.md`](CODING.md). AI coding agents should start from [`AGENTS.md`](AGENTS.md). Check https://pulsar.apache.org for documentation and examples. @@ -301,13 +267,15 @@ You can self-register at https://communityinviter.com/apps/apache-pulsar/apache- ## Security Policy -If you find a security issue with Pulsar then please [read the security policy](https://pulsar.apache.org/security/#security-policy). It is critical to avoid public disclosure. +See the [Pulsar security policy](https://github.com/apache/pulsar/security/policy) for the full policy. In short: **never disclose a suspected vulnerability publicly** (issue, PR, or discussion) before a fix is released, and a human must verify and take responsibility for a report — autonomous agents must not file security reports on their own. ### Reporting a security vulnerability -To report a vulnerability for Pulsar, contact the [Apache Security Team](https://www.apache.org/security/). When reporting a vulnerability to [security@apache.org](mailto:security@apache.org), you can copy your email to [private@pulsar.apache.org](mailto:private@pulsar.apache.org) to send your report to the Apache Pulsar Project Management Committee. This is a private mailing list. +Report privately by email to the [Apache Security Team](https://www.apache.org/security/) at [security@apache.org](mailto:security@apache.org); you may also copy the Apache Pulsar PMC's private list, [private@pulsar.apache.org](mailto:private@pulsar.apache.org). See the [Pulsar security policy](https://github.com/apache/pulsar/security/policy) for more detail. + +### Checking exposure to an already-public CVE -https://github.com/apache/pulsar/security/policy contains more details. +An already-public CVE is **not** a private disclosure. Search PRs and issues by the CVE id at (including closed ones), then ask via a GitHub issue or the dev@pulsar.apache.org mailing list if needed. Pulsar's supported versions are listed at ; upgrade to a supported version to receive security updates. See the [Pulsar security policy](https://github.com/apache/pulsar/security/policy). ## License diff --git a/SECURITY.md b/SECURITY.md index acd8c0140b014..946f9dcaa3991 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1,3 +1,102 @@ # Security Policy -See also https://pulsar.apache.org/security/. +The authoritative policy is at ; the Apache Software Foundation's +general process is at . The latest version of this file is maintained +at — in a fork, check there rather than a +possibly-stale local copy. The summary below is what contributors (and any AI tooling acting on their +behalf) must follow. + +## Reporting a vulnerability + +Do **not** open a public GitHub issue, pull request, or discussion for a suspected vulnerability — +that defeats coordinated disclosure. + +Report it privately by email to the Apache Security Team at **security@apache.org** (the ASF's central +security address). When reporting to **security@apache.org**, you can copy your email to the Apache +Pulsar PMC's private list, **private@pulsar.apache.org**, to send your report to the Project Management +Committee as well. + +**A human must verify and take responsibility for every security report.** Deciding that some +behaviour is actually a vulnerability requires judgement against Pulsar's threat model (see +*[Security model and scope](#security-model-and-scope)* below), and the security team is staffed by +volunteers — a stream of unverified or AI-hallucinated "vulnerabilities" wastes their time and buries +real issues. AI tooling may help *analyse* a suspected problem, but a human contributor must +independently verify it and own the report. **Autonomous agents must not file security reports or open +security issues on their own**, and a tool's confident-sounding output is not, by itself, evidence of a +vulnerability. + +See for more details. + +## Disclosure hygiene for contributors + +The **project team commits the fix**, coordinated through the ASF security vulnerability handling +process +([apache.org/security/committers.html#possible](https://apache.org/security/committers.html#possible)). +The team may commit the fix to the public repository **before** the release, using a neutral commit +message that does not state its security nature; in severe cases the commit and release are made in a +private repository and the fix is made public only at release time. + +As a contributor, do **not** push, commit publicly, or open a PR for a fix to a non-public security +issue yourself — **including in a public fork**, since a public-fork commit or PR is itself a +disclosure. When reporting, you may include a suggested fix patch privately in your report to +`private@pulsar.apache.org` — never in a public commit or PR. + +The neutrality rules below are for **whoever commits the fix — i.e. the project team**. Until the +vulnerability has been publicly announced, the commit message and PR title/body must **not** reveal its +security nature, even when the fix touches security-adjacent code (authentication/authorization, +deserialization, TLS, networking). Describe the behaviour change neutrally; a commit or PR that +advertises "fixes the CVE", "security fix", or "patches the vulnerability" discloses the issue before +it is announced and defeats coordinated disclosure. + +The same discretion applies to **everyone** — and identically to any AI tooling acting on a +contributor's behalf — in public GitHub issues, discussions, and review comments until the +announcement. + +**Already-public CVEs in dependencies are an exception.** The rules above concern *non-public* Pulsar +vulnerabilities. A PR or commit that upgrades a dependency to address an **already publicly disclosed** +CVE in that dependency does **not** follow them — the CVE is already public. Name its id directly in +the PR title and/or description (use the description when there are several CVE ids). + +## Checking exposure to an already-public CVE + +For a CVE that is **already public** (for example, in a dependency) and you want to check Pulsar's +exposure or whether it is already fixed, this is **not** a private disclosure — the right venue is a +GitHub issue on apache/pulsar or a question on the **dev@pulsar.apache.org** mailing list. + +Before asking, search PRs and issues with the CVE id at +(also check issues and **closed** PRs/issues) — the fix may already be merged or the question already +answered. + +## Security model and scope + +Pulsar's security model is not formally/explicitly defined. Two long-standing design assumptions +matter when deciding whether something is actually a vulnerability: + +**Pulsar Functions and connectors execute fully trusted code.** The function instance runtime exists +to run user-provided code — *remote code execution is its core purpose, not a flaw.* (There have been +some reports claiming that "the function instance runtime allows running user-provided code and results +in an RCE"; this is expected, by-design behaviour, not a security issue.) The available execution models also let the code modify its environment: + +- The **thread** and **process** runtimes can read or modify any files and state accessible to the + process they run in. +- The **Kubernetes** runtime, on its own, does not restrict access to resources in the Kubernetes + cluster. The project provides hooks for custom Kubernetes-runtime *hardening*, but such hardening is + **not** part of the project. + +Therefore, Pulsar Functions and connectors must only run code that is **fully trusted**. + +**Clusters rely on network-perimeter security.** Pulsar is designed to be deployed behind a trusted +network perimeter where only trusted users can reach the cluster. The project does **not** implement +explicit controls against malicious **denial-of-service (DoS)** attacks. Rate limiting exists to mitigate +*unintentional* DoS — e.g. improper configuration or thundering-herd effects — but it is **not** a +defense against a deliberate DoS by an attacker. + +Reports that amount to "a trusted function can run code / modify its environment" or "a +perimeter-trusted client can cause a denial-of-service" generally fall **outside** Pulsar's threat +model. When in doubt, ask through the reporting channels above rather than assuming. + +## Supported versions + +Pulsar's supported versions are listed at . +Security fixes are made to supported versions; users should upgrade to a supported version to receive +security updates.