Skip to content

AST-148815 : Add CLAUDE.md with architecture and contract notes#20

Open
cx-anurag-dalke wants to merge 2 commits into
mainfrom
feature/AST-148815
Open

AST-148815 : Add CLAUDE.md with architecture and contract notes#20
cx-anurag-dalke wants to merge 2 commits into
mainfrom
feature/AST-148815

Conversation

@cx-anurag-dalke
Copy link
Copy Markdown
Contributor

Adds a CLAUDE.md at the repo root documenting the Parser interface / ParsersFactory dispatch model and how to wire a new ecosystem.
Captures per-parser quirks (Maven multi-line Locations, npm lockfile fallback, pypi ==-only scope, dotnet variants).
Documents invariants downstream AST-CLI depends on: 0-based line numbers, literal "latest" for unresolvable versions, stable PackageManager strings (mvn/npm/pypi/go/nuget).
Notes the 60% CI coverage floor.

cx-anurag-dalke and others added 2 commits April 20, 2026 21:27
Captures the Parser/factory dispatch model, per-ecosystem quirks,
and the invariants (0-based line numbers, "latest" sentinel,
PackageManager strings) that downstream AST-CLI relies on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Logo
Checkmarx One – Scan Summary & Detailsf3b51698-d956-4cc3-99f0-fe01c74be679

Great job! No new security vulnerabilities introduced in this pull request

Copy link
Copy Markdown
Collaborator

@cx-atish-jadhav cx-atish-jadhav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started — the content that's here is accurate and well-grounded in the code (verified the Parser interface signature, ParsersFactory, Package/Location structs, the 9e490aa commit reference, the 60% CI floor, and the Go 1.23 / toolchain 1.24.2 pins all match the repo).

Leaving inline comments for consistency with the JIRA epic template (AST-148815 parent). The epic lists these essential sections: Project Overview, Architecture, Repository Structure, Technology Stack, Development Setup, Coding Standards, Project Rules, Testing Strategy, Known Issues, External Integrations, Deployment, Performance, API/Interfaces, Security & Access, Logging, Debugging Steps. This PR covers Project Overview, Architecture, Project Rules (as "Invariants"), and Testing Strategy well — the inline comments point out the missing / thin sections plus a few small accuracy / clarity nits. Some sections (DB schema, deployment) are genuinely N/A for a parser library; recommend adding a one-liner saying so rather than omitting, so every repo's CLAUDE.md has a predictable shape.

Comment thread CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ## Project Overview to match the JIRA epic template. The epic also asks for "Purpose and status" — please add a one-liner on status (active / maintained / experimental / deprecated).

Comment thread CLAUDE.md

Go module that parses package manifests from multiple ecosystems (Maven, npm, Python, Go, .NET) and returns each declared dependency along with the **exact line/character range** of its declaration. Consumed by [AST-CLI](https://github.com/Checkmarx/ast-cli) to correlate manifest entries with Checkmarx runtime scans — so the `Locations` field is part of the public contract, not a debugging convenience.

## Commands
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epic asks for a ## Development Setup section. Consider renaming this and adding: (a) prerequisites (Go ≥ 1.23, git), (b) a note that dependencies are vendored so no go mod download is required, (c) an example invocation against a fixture, e.g. go run ./cmd test/resources/pom.xml. Right now a new contributor has to guess what a valid input path looks like.

Comment thread CLAUDE.md
go test -run TestName ./path/... # run a single test by name
go test ./... -coverprofile cover.out # CI gate: total coverage must be >= 60%
go build -o manifest-parser ./cmd # build CLI
go run ./cmd <manifest-file> # run CLI against a manifest
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider showing a truncated sample output (a Package JSON snippet) so readers know what success looks like without opening cmd/main.go.

Comment thread CLAUDE.md
go run ./cmd <manifest-file> # run CLI against a manifest
```

Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please promote this into a dedicated ## Technology Stack section (one of the epic's essential sections) listing: Go 1.23 / toolchain 1.24.2, github.com/stretchr/testify v1.8.4, golang.org/x/mod v0.24.0, stdlib encoding/xml + encoding/json. Explicitly state "no database" and "no web framework" so the N/A sections are unambiguous.

Comment thread CLAUDE.md

Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2).

## Architecture
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ## Repository Structure section (or subsection) with the top-level folder tree: cmd/, pkg/parser/, internal/parsers/{maven,npm,pypi,golang,dotnet}/, test/resources/, vendor/. The epic lists "Repository Structure — Folder organization" as its own essential section.

Comment thread CLAUDE.md
Per-ecosystem parsers live under [internal/parsers/](internal/parsers/):
- `maven/` — parses `pom.xml` with `encoding/xml`, then re-scans the raw text to locate each `<dependency>` block line by line. Resolves `${property}` vars from `<properties>` and falls back to `<dependencyManagement>` for empty/ranged versions. Only **direct** `<dependencies>` are emitted (managed-only deps are intentionally skipped to avoid duplicates — see commit `9e490aa`).
- `npm/` — parses `package.json` plus, if present as a sibling file, `package-lock.json` (v1 and v2/v3 formats). Ranged specifiers (`^`, `~`, `*`, `>`, `<`) trigger a lookup in the lockfile; `isLockVersionGreater` compares part-by-part numerically to decide whether the lockfile version satisfies the spec. Without a lock match, ranged versions resolve to `"latest"`.
- `pypi/` — line-oriented scan of `requirements*.txt` / `packages*.txt`. **Only `package==version` is supported** — `pip freeze`, Poetry, and pip-tools output are explicitly out of scope (see README "Known Limitations"). Comments (`#`) and environment markers (`;`) are stripped.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pypi ==-only limitation belongs in a dedicated ## Known Issues / Limitations section per the epic. Please consolidate there: pypi == only (no pip freeze / Poetry / pip-tools), npm ranged versions without a lockfile resolve to "latest", Maven managed-only deps not emitted, etc. Scatter-and-gather hurts discoverability.

Comment thread CLAUDE.md
- `golang/` — uses `golang.org/x/mod/modfile` to parse `go.mod`, then uses the parser's line metadata to compute character offsets.
- `dotnet/` — three separate parsers sharing patterns: `csproj_parser.go` (`.csproj`), `directory_packages_props_parser.go` (central package management), `packages_config_parser.go` (legacy). Versions are read from either a `Version` attribute or a nested `<Version>` element; bracketed ranges become `"latest"`.

### Invariants worth preserving
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epic names this section "Project Rules — Don'ts and constraints". Consider renaming (or using both: ### Project Rules (Invariants)) for cross-repo consistency — the standardization goal is about predictable headings across repos, not just content.

Comment thread CLAUDE.md

### Invariants worth preserving

- **`Location` uses 0-based line numbers** in most parsers (Maven, Go, npm, pypi use `lineNum - 1` or a 0-based counter). Downstream AST-CLI depends on this; don't "fix" it to 1-based without coordinating.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two ambiguities in Location worth clarifying: are StartIndex / EndIndex 0-based or 1-based, and are they byte offsets or rune/character offsets? AST-CLI callers need to know to render the annotation correctly for non-ASCII manifests.

Comment thread CLAUDE.md

## Tests & fixtures

Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add: (a) how to view coverage locally (go tool cover -html cover.out), (b) the expected pattern for a new parser (fixture under test/resources/ + *_test.go co-located with parser using testify), (c) any naming convention for fixtures. These answer "how do I add a test?" which is the epic's intent for this section.

Comment thread CLAUDE.md

Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source.

CI ([.github/workflows/ci.yml](.github/workflows/ci.yml)) enforces a **60% total coverage floor** — adding an untested branch to an already-thin package can push the whole repo below the gate.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sections from the epic template — please add, even as one-liners for the N/A ones, so every repo's CLAUDE.md has a predictable shape:

  • External Integrations — consumed by AST-CLI; Locations field + PackageManager strings (mvn/npm/pypi/go/nuget) are load-bearing downstream.
  • Deployment — N/A (library; consumed via go get github.com/Checkmarx/manifest-parser).
  • Database Schema — N/A.
  • Performance Considerations — Maven re-scans raw XML after encoding/xml parse (two passes); no streaming; large pom.xml files load fully into memory.
  • Security & Access — parsers consume untrusted manifest files. Note XXE posture (encoding/xml doesn't resolve external entities by default — worth stating explicitly) and whether there's a file-size bound.
  • Logging — CLI uses log.Fatalf on error; library returns error and does not log. Callers should not expect library log output.
  • Debugging Steps — how to run one parser against one fixture, how to set -v for verbose tests, common failure mode (location off-by-one → check 0-based invariant).
  • Coding Standardsgofmt / go vet clean; exported identifiers in pkg/, internal logic in internal/; parser packages follow <ecosystem>/<ecosystem>_parser.go + <ecosystem>_parser_test.go layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants