AST-148815 : Add CLAUDE.md with architecture and contract notes#20
AST-148815 : Add CLAUDE.md with architecture and contract notes#20cx-anurag-dalke wants to merge 2 commits into
Conversation
Captures the Parser/factory dispatch model, per-ecosystem quirks, and the invariants (0-based line numbers, "latest" sentinel, PackageManager strings) that downstream AST-CLI relies on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Great job! No new security vulnerabilities introduced in this pull request |
cx-atish-jadhav
left a comment
There was a problem hiding this comment.
Thanks for getting this started — the content that's here is accurate and well-grounded in the code (verified the Parser interface signature, ParsersFactory, Package/Location structs, the 9e490aa commit reference, the 60% CI floor, and the Go 1.23 / toolchain 1.24.2 pins all match the repo).
Leaving inline comments for consistency with the JIRA epic template (AST-148815 parent). The epic lists these essential sections: Project Overview, Architecture, Repository Structure, Technology Stack, Development Setup, Coding Standards, Project Rules, Testing Strategy, Known Issues, External Integrations, Deployment, Performance, API/Interfaces, Security & Access, Logging, Debugging Steps. This PR covers Project Overview, Architecture, Project Rules (as "Invariants"), and Testing Strategy well — the inline comments point out the missing / thin sections plus a few small accuracy / clarity nits. Some sections (DB schema, deployment) are genuinely N/A for a parser library; recommend adding a one-liner saying so rather than omitting, so every repo's CLAUDE.md has a predictable shape.
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Overview |
There was a problem hiding this comment.
Rename to ## Project Overview to match the JIRA epic template. The epic also asks for "Purpose and status" — please add a one-liner on status (active / maintained / experimental / deprecated).
|
|
||
| Go module that parses package manifests from multiple ecosystems (Maven, npm, Python, Go, .NET) and returns each declared dependency along with the **exact line/character range** of its declaration. Consumed by [AST-CLI](https://github.com/Checkmarx/ast-cli) to correlate manifest entries with Checkmarx runtime scans — so the `Locations` field is part of the public contract, not a debugging convenience. | ||
|
|
||
| ## Commands |
There was a problem hiding this comment.
The epic asks for a ## Development Setup section. Consider renaming this and adding: (a) prerequisites (Go ≥ 1.23, git), (b) a note that dependencies are vendored so no go mod download is required, (c) an example invocation against a fixture, e.g. go run ./cmd test/resources/pom.xml. Right now a new contributor has to guess what a valid input path looks like.
| go test -run TestName ./path/... # run a single test by name | ||
| go test ./... -coverprofile cover.out # CI gate: total coverage must be >= 60% | ||
| go build -o manifest-parser ./cmd # build CLI | ||
| go run ./cmd <manifest-file> # run CLI against a manifest |
There was a problem hiding this comment.
Consider showing a truncated sample output (a Package JSON snippet) so readers know what success looks like without opening cmd/main.go.
| go run ./cmd <manifest-file> # run CLI against a manifest | ||
| ``` | ||
|
|
||
| Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2). |
There was a problem hiding this comment.
Please promote this into a dedicated ## Technology Stack section (one of the epic's essential sections) listing: Go 1.23 / toolchain 1.24.2, github.com/stretchr/testify v1.8.4, golang.org/x/mod v0.24.0, stdlib encoding/xml + encoding/json. Explicitly state "no database" and "no web framework" so the N/A sections are unambiguous.
|
|
||
| Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2). | ||
|
|
||
| ## Architecture |
There was a problem hiding this comment.
Add a ## Repository Structure section (or subsection) with the top-level folder tree: cmd/, pkg/parser/, internal/parsers/{maven,npm,pypi,golang,dotnet}/, test/resources/, vendor/. The epic lists "Repository Structure — Folder organization" as its own essential section.
| Per-ecosystem parsers live under [internal/parsers/](internal/parsers/): | ||
| - `maven/` — parses `pom.xml` with `encoding/xml`, then re-scans the raw text to locate each `<dependency>` block line by line. Resolves `${property}` vars from `<properties>` and falls back to `<dependencyManagement>` for empty/ranged versions. Only **direct** `<dependencies>` are emitted (managed-only deps are intentionally skipped to avoid duplicates — see commit `9e490aa`). | ||
| - `npm/` — parses `package.json` plus, if present as a sibling file, `package-lock.json` (v1 and v2/v3 formats). Ranged specifiers (`^`, `~`, `*`, `>`, `<`) trigger a lookup in the lockfile; `isLockVersionGreater` compares part-by-part numerically to decide whether the lockfile version satisfies the spec. Without a lock match, ranged versions resolve to `"latest"`. | ||
| - `pypi/` — line-oriented scan of `requirements*.txt` / `packages*.txt`. **Only `package==version` is supported** — `pip freeze`, Poetry, and pip-tools output are explicitly out of scope (see README "Known Limitations"). Comments (`#`) and environment markers (`;`) are stripped. |
There was a problem hiding this comment.
The pypi ==-only limitation belongs in a dedicated ## Known Issues / Limitations section per the epic. Please consolidate there: pypi == only (no pip freeze / Poetry / pip-tools), npm ranged versions without a lockfile resolve to "latest", Maven managed-only deps not emitted, etc. Scatter-and-gather hurts discoverability.
| - `golang/` — uses `golang.org/x/mod/modfile` to parse `go.mod`, then uses the parser's line metadata to compute character offsets. | ||
| - `dotnet/` — three separate parsers sharing patterns: `csproj_parser.go` (`.csproj`), `directory_packages_props_parser.go` (central package management), `packages_config_parser.go` (legacy). Versions are read from either a `Version` attribute or a nested `<Version>` element; bracketed ranges become `"latest"`. | ||
|
|
||
| ### Invariants worth preserving |
There was a problem hiding this comment.
The epic names this section "Project Rules — Don'ts and constraints". Consider renaming (or using both: ### Project Rules (Invariants)) for cross-repo consistency — the standardization goal is about predictable headings across repos, not just content.
|
|
||
| ### Invariants worth preserving | ||
|
|
||
| - **`Location` uses 0-based line numbers** in most parsers (Maven, Go, npm, pypi use `lineNum - 1` or a 0-based counter). Downstream AST-CLI depends on this; don't "fix" it to 1-based without coordinating. |
There was a problem hiding this comment.
Two ambiguities in Location worth clarifying: are StartIndex / EndIndex 0-based or 1-based, and are they byte offsets or rune/character offsets? AST-CLI callers need to know to render the annotation correctly for non-ASCII manifests.
|
|
||
| ## Tests & fixtures | ||
|
|
||
| Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source. |
There was a problem hiding this comment.
Please add: (a) how to view coverage locally (go tool cover -html cover.out), (b) the expected pattern for a new parser (fixture under test/resources/ + *_test.go co-located with parser using testify), (c) any naming convention for fixtures. These answer "how do I add a test?" which is the epic's intent for this section.
|
|
||
| Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source. | ||
|
|
||
| CI ([.github/workflows/ci.yml](.github/workflows/ci.yml)) enforces a **60% total coverage floor** — adding an untested branch to an already-thin package can push the whole repo below the gate. |
There was a problem hiding this comment.
Missing sections from the epic template — please add, even as one-liners for the N/A ones, so every repo's CLAUDE.md has a predictable shape:
- External Integrations — consumed by AST-CLI;
Locationsfield +PackageManagerstrings (mvn/npm/pypi/go/nuget) are load-bearing downstream. - Deployment — N/A (library; consumed via
go get github.com/Checkmarx/manifest-parser). - Database Schema — N/A.
- Performance Considerations — Maven re-scans raw XML after
encoding/xmlparse (two passes); no streaming; largepom.xmlfiles load fully into memory. - Security & Access — parsers consume untrusted manifest files. Note XXE posture (
encoding/xmldoesn't resolve external entities by default — worth stating explicitly) and whether there's a file-size bound. - Logging — CLI uses
log.Fatalfon error; library returnserrorand does not log. Callers should not expect library log output. - Debugging Steps — how to run one parser against one fixture, how to set
-vfor verbose tests, common failure mode (location off-by-one → check 0-based invariant). - Coding Standards —
gofmt/go vetclean; exported identifiers inpkg/, internal logic ininternal/; parser packages follow<ecosystem>/<ecosystem>_parser.go+<ecosystem>_parser_test.golayout.

Adds a CLAUDE.md at the repo root documenting the Parser interface / ParsersFactory dispatch model and how to wire a new ecosystem.
Captures per-parser quirks (Maven multi-line Locations, npm lockfile fallback, pypi ==-only scope, dotnet variants).
Documents invariants downstream AST-CLI depends on: 0-based line numbers, literal "latest" for unresolvable versions, stable PackageManager strings (mvn/npm/pypi/go/nuget).
Notes the 60% CI coverage floor.