Skip to content

feat: add structural tree parser for conditional-aware spec editing#215

Open
trungams wants to merge 9 commits into
microsoft:mainfrom
trungams:tvuong/feat/spec-visit
Open

feat: add structural tree parser for conditional-aware spec editing#215
trungams wants to merge 9 commits into
microsoft:mainfrom
trungams:tvuong/feat/spec-visit

Conversation

@trungams
Copy link
Copy Markdown
Member

Summary

Replaces the flat Visit-based spec line walker with a structural tree parser for all spec editing operations. This makes conditional-boundary bugs structurally impossible and fixes several long-standing issues (#144, #193, #203).

Closes #214

Changes

New: tree parser (tree.go)

Two-pass parser that builds a block tree from raw spec lines:

  • Pass 1: collects %if/%endif pairs, skipping pairs inside %define/%global macro continuation bodies
  • Pass 2: builds nested block tree, classifying conditionals as wrappers (contain section headers) vs content blocks

Block kinds: rootBlock, sectionBlock, conditionalBlock, textBlock, macroDefBlock. Lossless round-trip via serializeTree.

New: tree API (tree_api.go)

specTree / sectionHandle types wrap the parsed tree. Spec.mutateTree handles parse → mutate → serialize; Spec.inspectTree for read-only access. The public Spec.* API is unchanged for callers.

New: SearchAndReplace rewrite (searchReplaceBlock)

Dedicated recursive walker that visits ALL line types (text, macros, conditional headers, %endif). The old VisitAllLines-based implementation skipped %define/%global and conditional directive lines. Also relaxes the section filter for wrapper else-branch content.

New: macro hoisting (tree_hoist.go)

RemoveSections now automatically hoists %define/%global macros from removed sections when they're referenced by surviving content (#203). Scans for %{name}, %{?name}, %{!?name}, bare %name, and conditional headers like %if 0%{?with_name}.

New: GetTag and PrependLinesToAllSections

  • GetTag(packageName, tag) — read-only tag lookup via inspectTree, replaces VisitTagsPackage usage
  • PrependLinesToAllSections + spec-prepend-all-lines overlay — prepends to every matching section (used by check-skip to disable all %check sections)

New: testdata fixtures + synthetic stress

13 curated .spec fixtures distilling real-world patterns. Two test tiers (lossless round-trip + per-pattern edit-operation regression). Synthetic generator with 64+32 deterministic seeds.

Removed: Visit infrastructure

Visit, VisitTags, VisitTagsPackage, Context, VisitTarget, VisitTargetType, Visitor, parseState, parseSpecLine, newParsedLine, parseLogicalLine — all deleted. The sole external caller (release.go) migrated to GetTag.

Also removed: sectionLineRange, collectSectionRanges, balanceRange, validateNoContentAfter, skipPastConditional, findInsertTagPosition, isBlankOrComment, removeRanges, and related helpers.

Docs

  • RFC 001 added to docs/developer/rfc/
  • Known limitations documented in docs/user/reference/config/overlays.md

Validation

Full render against Azure Linux 4.0 (~7,400 components):

Metric Baseline This PR Delta
Wall time 24m 11s 22m 59s -5.0%
CPU (user) 303m 34s 277m 07s -8.7%
Render failures 0 0

Rendered output: tvuong/4.0/feat/azldev-parsetree

Known limitations

  1. Straddling sections — section header inside %if wrapper with content past %endif is invisible to section-scoped overlays. Workaround: spec-search-replace. Affects 1 spec (gdb).
  2. Macro-generated sections — sections from %ghc_lib_subpackage, %fontpkg, etc. are invisible to the static parser.

Copilot AI review requested due to automatic review settings May 30, 2026 18:01
trungams added 5 commits May 30, 2026 11:02
Adds the design RFC documenting the problem (straddling conditionals
break naive line-range edits), the two evaluated approaches (enriched
structure vs structural tree), and the chosen direction (tree model).

Also documents known limitations in the overlays user guide:
section-scoped operations cannot reach content past a wrapper %endif
when the section header is inside the wrapper (workaround:
spec-search-replace with anchored regex), and macro-generated sections
are invisible to the static parser.
Two-pass parser that builds a block tree from raw spec lines:
- Pass 1 (collectConditionalPairs in edit.go): collects conditional
  pairs, skipping %if/%endif inside %define/%global macro continuation
  bodies. Shell-level continuations still process %if as structural.
- Pass 2 (buildBlockChildren): builds nested block tree, classifying
  conditionals as wrappers (contain section headers) vs content blocks.

Block kinds: rootBlock, sectionBlock, conditionalBlock, textBlock,
macroDefBlock. Preamble content is wrapped in an implicit sectionBlock
with empty Name for uniform access via findSectionBlock.

All tree types and helpers are kept unexported so the parsed
representation remains an internal implementation detail.

Includes serializeTree for lossless round-tripping, query helpers
(findSectionBlock, findAllSectionBlocks, findAllSectionBlocksByPackage),
mutation helpers (removeBlockFromParent), and validateSectionRemoval
for detecting unsafe removal patterns (orphaned content).

Validation checks:
- Preceding-section: wrapper has text AND preceding section is removed.
- Empty-wrapper + adjacent content: all sections removed from wrapper
  AND adjacent content conditional would be orphaned.
Cross-branch asymmetry is intentionally allowed (removing from %if
while %else retains sections is a valid overlay operation).
…tch of edit operations

Add a tree interface in tree_api.go so edit.go callers no longer reach
into the parse tree directly. specTree wraps the parsed *block root
and exposes query helpers (Section, Sections, SectionsByPackage,
RemoveSections); sectionHandle wraps a section block with
append/prepend operations. Spec.mutateTree handles parse -> mutate ->
serialize so call sites become 3-line orchestrations.

Three edit.go operations now go through the interface as a starting
point: AppendLinesToSection, RemoveSection, and RemoveSubpackage.
Remaining Visit-based operations will migrate in follow-up commits.
Move HasSection off the Visit-based scan and onto the tree interface.
Adds Spec.inspectTree (the read-only sibling of mutateTree) and
specTree.HasSection, which walks the parsed tree (including sections
inside conditional wrappers, matching previous Visit semantics).
Add line-level iteration to the tree interface (specTree.VisitAllLines /
sectionHandle.VisitLines + lineHandle) plus a tag-aware insertion helper
(sectionHandle.InsertTag), and use them to rewrite the tag-mutating
operations on Spec.

No behavior change for callers. Removes findInsertTagPosition,
skipPastConditional, and insertTagScanResult.
@trungams trungams force-pushed the tvuong/feat/spec-visit branch from c4e7ac6 to 46a9ae0 Compare May 30, 2026 18:02
trungams added 4 commits May 31, 2026 05:18
Convert the last group of Visit-based mutators to mutateTree + section
or line handles, then remove the Visit infrastructure entirely:

Migrated operations:
- PrependLinesToSection -> sectionHandle.PrependLines.
- AddChangelogEntry -> sectionHandle.PrependLines on %changelog.
- SearchAndReplace -> searchReplaceBlock, a recursive tree walker that
  visits ALL line types (text, macros, conditional headers, %endif) so
  patterns targeting %define/%global or %if lines are found correctly.
  Wrapper else-branch section filter is relaxed for loose content.
- removePatchlistEntriesMatching -> sectionHandle.VisitLines on
  %patchlist.

New API:
- GetTag(packageName, tag) for read-only tag lookup via inspectTree.

Removed (Visit infrastructure):
- Visit, VisitTags, VisitTagsPackage from edit.go.
- Context, VisitTarget, VisitTargetType, Visitor, InsertLinesBefore,
  InsertLinesAfter, RemoveLine, ReplaceLine from spec.go.
- parseState, inContinuation, newParseState, parseSpecLine,
  newParsedLine, parseLogicalLine from spec.go.

Dead code from line-range approach also removed:
- sectionLineRange, collectSectionRanges, balanceRange,
  validateNoContentAfter, isBlankOrComment, removeRanges, etc.

Migrated external caller: release.go GetReleaseValue now uses
GetTag instead of VisitTagsPackage.
Stand up 13 hand-crafted minimized .spec fixtures under
internal/rpm/spec/testdata/specs/ distilling real-world patterns.

testdata_test.go drives fixtures through two tiers:
- Tier 1: lossless round-trip (byte-for-byte identity).
- Tier 3: per-pattern edit-operation regression tests.

Synthetic generator composes primitives into random spec inputs
(64 round-trip + 32 AddTag seeds).

Also includes SearchAndReplace tests for macro definitions,
conditional headers, and wrapper else-branch section filter.
When a %define/%global directive lives inside a %package subpackage
block but the macro it declares is referenced by surviving content,
RemoveSections now automatically hoists it to the root level.

Two regression tests validate both the positive case (macro hoisted)
and the negative case (unreferenced macro dropped cleanly).
…overlay

Add a new PrependLinesToAllSections method that prepends lines to every
section matching a given name and package, unlike PrependLinesToSection
which targets only the first match. This is exposed as the new
spec-prepend-all-lines overlay type.

Update the check-disablement synthesizer to use the new overlay type so
that specs with multiple %check sections (e.g., python-azure-mgmt-media,
python-zmq) get exit 0 prepended to all of them.
@trungams trungams force-pushed the tvuong/feat/spec-visit branch from 46a9ae0 to 56a4c63 Compare May 31, 2026 05:27
@trungams
Copy link
Copy Markdown
Member Author

The gdb render failure in e2e is expected as we'll need to update the overlay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Structural tree parser for conditional-aware spec editing

1 participant