Skip to content

[improve][misc] Add AGENTS.md and split contributor/coding/architecture/security docs, with .agents skills#25871

Open
lhotari wants to merge 27 commits into
apache:masterfrom
lhotari:lh-AGENTS.md
Open

[improve][misc] Add AGENTS.md and split contributor/coding/architecture/security docs, with .agents skills#25871
lhotari wants to merge 27 commits into
apache:masterfrom
lhotari:lh-AGENTS.md

Conversation

@lhotari
Copy link
Copy Markdown
Member

@lhotari lhotari commented May 26, 2026

Motivation

The repository had GitHub Copilot review instructions (.github/copilot-instructions.md) but no
top-level guide for general AI coding agents (Claude Code, Cursor, Gemini, Codex, Aider, …), and no
single, concise contributor reference for the build / test / contribution workflow after the
Maven→Gradle migration (PIP-463).

This PR adds that guidance and — following review feedback on the first iteration — structures it the
way apache/groovy and apache/grails organize theirs:
human-facing split docs at the repository root, a concise AGENTS.md index, and per-task skills under
.agents/skills/ that are loaded on demand so agents don't pull every instruction into context (a
token-economy concern for contributors on metered plans). The coding/contribution conventions were
distilled from recurring guidance in past apache/pulsar PR reviews.

Important

New logging convention — please confirm the direction. The coding guidelines document a
preference for the slog structured-logging library via Lombok
@CustomLog, with SLF4J treated as deprecated for new code, structured attributes + lazy
evaluation instead of isDebugEnabled() guards, and defaulting new logs to TRACE/DEBUG. slog is
already wired into the build; this PR is the first time the preference is written down.

Modifications

Root docs (human-facing, the source of truth) + an AGENTS.md index + .agents/skills/:

  • AGENTS.md — concise router/index: a "Licensing and provenance (read first)" section (ASF
    Generative Tooling guidance: human-in-the-loop
    accountability, provenance/licensing, attribution), a canonical-docs table, a skills table, critical
    rules, and where to ask.
  • ARCHITECTURE.md (new) — module map, the Gradle build infrastructure, the
    module-name-vs-directory gotcha, the pip/ proposals, the (undocumented) concurrency model and
    backpressure.
  • CODING.md (new) — style, data types, async/CompletableFuture, concurrency + Java Memory
    Model, logging (slog), resource/memory management, performance, dependencies, backward compatibility
    (incl. plugin/SPI extension points), testing conventions, and the review checklist.
  • CONTRIBUTING.md — expanded from a website-pointer stub into the local dev workflow: build,
    lint, --tests-scoped runs, test groups, integration tests, Personal CI, PR conventions, scope &
    branches/backports, security reporting.
  • SECURITY.md — reporting, disclosure hygiene, the (informal) security model & threat scope, and
    checking exposure to an already-public CVE.
  • .agents/skills/ (new) — lean, on-demand guardrail skills (pulsar-build, pulsar-tests,
    pulsar-pr-workflow, pulsar-security) that cite the canonical docs rather than restating them,
    plus a README.md index.
  • CLAUDE.md and .github/copilot-instructions.md are now symlinks to AGENTS.md.
  • .github/PULL_REQUEST_TEMPLATE.mdCloses #xyz accepted alongside Fixes #xyz; notes the
    CI-enforced title prefixes and that Motivation/Modifications are required.
  • README.md — a short note on checking exposure to an already-public CVE.

Note

.github/copilot-instructions.md becoming a symlink to AGENTS.md means Copilot's detailed review
guidance now lives in CODING.md (which AGENTS.md links to). If Copilot doesn't follow the symlink
or traverse to CODING.md, its in-review guidance is thinner than before — worth confirming. (Done
per the review suggestion to symlink the per-tool files to AGENTS.md.)

Verifying this change

  • Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage. It is documentation-only
(Markdown, plus two symlinks); all changed paths are excluded from RAT / Checkstyle / Spotless.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

@asafm
Copy link
Copy Markdown
Contributor

asafm commented May 26, 2026

@lhotari Couple of high level notes first:

  1. From what I know, it's best to have a dedicated folder for agent documentation (e.g. .ai) and place any LLM agent instructions there.
  2. Once you have (1), you can split the context per what is needed - everything related to contributing (setting up local dev env, rules for PR creation ,etc) goes to CONTRIBUTING.md; Architecture and how Pulsar works can go to ARCHITECTURE.md, and coding guidelines and best practices can go to CODING.md.
  3. Once you have (2) AGENTS.md is effectively an index - describing the different LLM doc files you created in (2), and where they are located.
  4. Once you have (3), all is left is to have the files for other LLMs be a symlink to AGENTS.md so , github-instructions.md and CLAUDE.md is just symlink.

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented May 26, 2026

@lhotari Couple of high level notes first:

  1. From what I know, it's best to have a dedicated folder for agent documentation (e.g. .ai) and place any LLM agent instructions there.
  2. Once you have (1), you can split the context per what is needed - everything related to contributing (setting up local dev env, rules for PR creation ,etc) goes to CONTRIBUTING.md; Architecture and how Pulsar works can go to ARCHITECTURE.md, and coding guidelines and best practices can go to CODING.md.
  3. Once you have (2) AGENTS.md is effectively an index - describing the different LLM doc files you created in (2), and where they are located.
  4. Once you have (3), all is left is to have the files for other LLMs be a symlink to AGENTS.md so , github-instructions.md and CLAUDE.md is just symlink.

@asafm thanks for the suggestions. I guess we can iterate on this in further PRs. I won't have time to make such improvements at this time. Would you like to take over the restructuring after this PR has been merged?

btw. Some repositories provide skills for AI agents for performing specific tasks in the repository. example: https://github.com/apache/grails-core/blob/7.0.x/AGENTS.md#available-skills . I guess such a solution could save tokens so that the agent doesn't always pull in all information in AGENTS.md and referenced files into the context.

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented May 26, 2026

This seems to be a good example where CONTRIBUTING.md and ARCHITECTURE.md are referenced:
https://github.com/apache/groovy/blob/master/AGENTS.md
There's also a good amount of skills to be used by agents (or humans driving an agent):
https://github.com/apache/groovy/blob/master/AGENTS.md#skills

@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented May 26, 2026

I guess I could use an agent (Claude Code) to restructure according to a similar approach that is used in apache/groovy or apache/grails.

lhotari added 4 commits May 26, 2026 13:29
…RE/CODING/SECURITY, add .agents/skills

Follow apache/groovy's layout per PR review feedback: AGENTS.md becomes a
concise router/index; the detail moves to human-facing CONTRIBUTING.md
(dev/build/test/PR/CI), ARCHITECTURE.md (modules + build), CODING.md
(conventions + review checklist), and SECURITY.md (reporting, disclosure
hygiene, public-CVE checks). Task-specific guardrails live under
.agents/skills/ (pulsar-build, pulsar-tests, pulsar-pr-workflow,
pulsar-security) and are loaded on demand to keep agent context small.
CLAUDE.md and .github/copilot-instructions.md are now symlinks to AGENTS.md.
…ns, perimeter security, no malicious-DoS protection)
@lhotari lhotari changed the title [improve][misc] Add AGENTS.md and refresh contributor/agent convention docs [improve][misc] Add AGENTS.md and restructure contributor/agent docs with .agents/skills May 26, 2026
lhotari added 6 commits May 26, 2026 13:47
…formance/GC guidance

Expand CODING.md's Concurrency section with the Java Memory Model rules that
have historically tripped up Pulsar code: synchronization needs the same lock
for reads and writes, fields shared across threads need volatile, immutable vs.
effectively-immutable objects and safe publication/initialization, preferring
DefaultThreadFactory/FastThreadLocalThread, and how to reproduce
timing/platform-dependent bugs. Add a ZGC + Netty Recycler note (PIP-443) and a
JMH-benchmark guideline (microbench/). Add a Concurrency-model gap and
Backpressure (PIP-442) section to ARCHITECTURE.md, and point the pulsar-tests
skill at the reproduction guidance.
@lhotari
Copy link
Copy Markdown
Member Author

lhotari commented May 26, 2026

@asafm I have addressed your feedback about splitting into multiple files. PTAL

…from PR review feedback

Add recurring, generalizable guidance distilled from past apache/pulsar PR
reviews:

- CODING.md: data-type conventions (records, narrowest interface type, factory
  methods, minimize method/constructor params, builders incl. records, naming);
  async per-call-site evaluation + checkArgumentAsync; concurrency lock-scope;
  backward-compat for plugin/SPI interfaces (default methods, no third-party
  types in public APIs, opt-in behavior changes); Performance section
  (hot-path costs, no overhead under load, bounded caches/StringInterner);
  testing terminology (unit vs container integration tests), SharedPulsarBaseTest
  usage, integration-style vs unit-test design, JMH benchmarks, Awaitility.
- CONTRIBUTING.md: focused PRs / no drive-by refactor or reformatting,
  large-refactor discussion on dev@, branches & backports, /pulsarbot
  rerun-failure-checks and flaky-test handling.
- ARCHITECTURE.md: PIP-number reservation via dev@ thread.
- AGENTS.md: "stay in scope" critical rule.
- Skills (pulsar-build/tests/pr-workflow): matching guardrails.
lhotari added 2 commits May 26, 2026 17:10
Rewrite the four SKILL.md files as a lean, on-demand guardrail layer that cites
the canonical docs (CODING/CONTRIBUTING/ARCHITECTURE/SECURITY) instead of
duplicating their prose, to keep agent context small; trim frontmatter to
name + description. Expand CONTRIBUTING.md backport guidance: maintainers
handle backports, cherry-pick in merge order, dependent changes first, and drop
branch-4.1 from the example.
@lhotari lhotari changed the title [improve][misc] Add AGENTS.md and restructure contributor/agent docs with .agents/skills [improve][misc] Add AGENTS.md and split contributor/coding/architecture/security docs, with .agents skills May 26, 2026
lhotari added 2 commits May 26, 2026 17:23
…ks to the canonical policy

Slim README's Build section to a short quick-start that refers to
CONTRIBUTING/ARCHITECTURE/CODING/AGENTS for detail instead of repeating it, and
make the security section consistent with SECURITY.md. Point README's security
links to https://github.com/apache/pulsar/security/policy, and note in
SECURITY.md, AGENTS.md, CONTRIBUTING.md, and the pulsar-security skill that the
latest SECURITY.md is maintained there (so forks reference the canonical copy).
Comment thread AGENTS.md Outdated
Comment thread CLAUDE.md
Comment thread .github/PULL_REQUEST_TEMPLATE.md
Comment thread .github/PULL_REQUEST_TEMPLATE.md
Comment thread .github/PULL_REQUEST_TEMPLATE.md
lhotari added 3 commits May 26, 2026 17:44
…ated pulsarbot command

- Reorder "Disclosure hygiene" so the project-team-commits-the-fix paragraph
  comes first; clarify the commit-message/PR neutrality rules are for whoever
  commits the fix (the project team), and gate them on the vulnerability being
  announced.
- Note that already-public dependency CVEs are an exception: name the CVE id
  directly in the PR title/description.
- Document only `/pulsarbot rerun` (rerun-failure-checks is deprecated): it
  re-runs the failed jobs of a completed workflow run.
@lhotari lhotari requested a review from asafm May 26, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants