Skip to content

feat(component query): bulk spec querying with --arch enforcement#204

Open
liunan-ms wants to merge 3 commits into
microsoft:mainfrom
liunan-ms:liunan/specquery
Open

feat(component query): bulk spec querying with --arch enforcement#204
liunan-ms wants to merge 3 commits into
microsoft:mainfrom
liunan-ms:liunan/specquery

Conversation

@liunan-ms
Copy link
Copy Markdown
Contributor

Introduces azldev component query for bulk-querying rendered RPM specs inside a single shared mock chroot, with arch-aware policy enforcement.

CLI

  • New azldev component query command — selects components via the standard component filter, renders results as a table / CSV / Markdown / JSON.
  • New --arch flag (default x86_64, also accepts aarch64) driving rpmspec --target=<arch> and ExclusiveArch/ExcludeArch policy.

Docs

CLI reference under docs/user/reference/cli/ regenerated.

Copilot AI review requested due to automatic review settings May 21, 2026 21:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates azldev component query to operate on locally rendered specs and to batch rpmspec execution in a shared mock chroot, including reporting binary subpackages.

Changes:

  • Added batched spec querying via an embedded Python helper and shared Go scaffolding for running batch scripts in mock.
  • Extended spec query results to include binary subpackages and updated tests to validate subpackage extraction.
  • Updated CLI help/docs and adjusted component query to require and validate rendered-specs-dir.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scenario/internal/projecttest/testspec.go Adds subpackage rendering support in test spec generation.
scenario/component_query_test.go Updates scenario test to render first and validate subpackages in query output.
internal/rpm/specquery.go Exposes rpmspec queryformat constants and adds subpackage parsing helper.
internal/rpm/specquery_test.go Adds unit tests for subpackage-output parsing and SRPM parsing behavior.
internal/app/azldev/core/sources/specquery.go Introduces batch spec query orchestration and input validation.
internal/app/azldev/core/sources/specquery_test.go Adds tests for batched query JSON parsing and validation.
internal/app/azldev/core/sources/query_process.py Implements parallel per-spec rpmspec execution inside mock chroot.
internal/app/azldev/core/sources/mockprocessor.go Refactors shared batch-script runner and adds configurable required package install.
internal/app/azldev/command.go Emits partial results even when a command returns an error, with typed-nil guarding.
internal/app/azldev/cmds/component/query.go Reworks component query to read rendered specs and use batched rpmspec querying.
internal/app/azldev/cmds/component/mockproc.go Centralizes mock required-package presets and mock processor creation.
internal/app/azldev/cmds/component/query_test.go Updates query command tests for rendered-specs-dir validation behavior.
internal/app/azldev/cmds/component/query_internal_test.go Adds internal tests for building batch query inputs and skip behavior.
docs/user/reference/cli/azldev_component_query.md Updates CLI reference to match rendered-spec-based query and new --arch flag.
docs/user/reference/cli/azldev_component.md Updates command list description for component query.

Comment thread internal/app/azldev/command.go
Comment thread internal/app/azldev/core/sources/specquery.go Outdated
Comment thread internal/app/azldev/cmds/component/mockproc.go Outdated
Comment thread scenario/internal/projecttest/testspec.go

[distros.azurelinux.versions.'4.0'.default-component-config]
spec = { type = "upstream", upstream-distro = { name = "fedora", version = "42" } }
spec = { type = "upstream", upstream-distro = { name = "fedora", version = "43" } }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java-srpm-macros is not available in f42 as a standalone binary RPM. Bumping the test mock to releasever=43 (and the matching f43-build URL) to unblock the scenario tests.

reubeno and others added 2 commits May 27, 2026 16:43
… enforcement

Add an --arch flag to 'component query' (default x86_64) that drives
rpmspec's --target and enforces ExclusiveArch/ExcludeArch policy via a
probe queryformat wrapped around the srpm query. Specs excluded by the
selected arch surface as ExcludedFromArch entries and are summarized in
a per-run log line.

Harden the per-spec query subprocess:
- 180s rpmspec timeout with a dedicated _RpmspecTimeout exception
- UTF-8 decoding with errors=replace on file IO and subprocess output

Plumb the arch parameter through BatchQuerySpecs and add
ExcludedFromArch to SpecQueryResult / its JSON mapping.

Teach runFuncInternal to render partial results when the inner func
returns both a value and an error, so per-component query failures
still print the successful rows while exiting non-zero.

Add javapackages-common to the mock package list so %javadoc_package
(from macros.fjava) doesn't silently disappear.
Bump scenario AZL 4.0 upstream + mock releasever to Fedora 43;
  java-srpm-macros is missing from f42-build koji.
Regenerate CLI docs.
Copilot AI review requested due to automatic review settings May 27, 2026 16:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Comment thread internal/app/azldev/core/sources/specquery.go Outdated
}

// validateSpecRelPath rejects spec relative paths that could escape the
// specs-dir bind mount or contain control characters.
Comment thread internal/app/azldev/core/sources/specquery.go Outdated
Comment thread scenario/internal/projecttest/testspec.go Outdated
Comment thread internal/app/azldev/core/sources/query_process.py Outdated
@reubeno reubeno requested a review from dmcilvaney May 27, 2026 21:32
@reubeno
Copy link
Copy Markdown
Member

reubeno commented May 27, 2026

@dmcilvaney would you mind reviewing this? You'll be most familiar with the parts that touch the parallel mock processor.


// validateSpecRelPath rejects spec relative paths that could escape the
// specs-dir bind mount or contain control characters.
func validateSpecRelPath(relPath string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consolidate path validation to one spot, we have some semi common functions in fileutils.ValidateFilename, should augment that if it has holes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a ValidateRelPath to fileutils (built on top of ValidateFilename per segment) and delegate validateSpecRelPath to it.

// (e.g. some items succeeded, some failed). Render what we have
// before propagating so callers see the partial output and the
// process still exits non-zero.
if results != nil && !isNilValue(results) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update render to also support this now? I worked around it by returning no error even when things failed. If we can still output the results table even on failure, I think returning an error code does make sense. Might need to update checkRenderErrors() if I recall right.

This might need an associated fix in the main repo's PR checks, but we can do that when we move the alzdev version pin forward.

Copy link
Copy Markdown
Contributor Author

@liunan-ms liunan-ms Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in efd4560

// (e.g. some items succeeded, some failed). Render what we have
// before propagating so callers see the partial output and the
// process still exits non-zero.
if results != nil && !isNilValue(results) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal with update, although there I think we just want to call filterDisplayResults() before returning and add results[idx].Error != "" along with the other criteria. Otherwise I think it will dump 7k+ values out on an error.

Copy link
Copy Markdown
Contributor Author

@liunan-ms liunan-ms Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in efd4560

# Apply per-spec rewrites (e.g. ghc.spec) to a scratch copy if needed.
# _sourcedir/_specdir stay pinned to the original spec's directory via
# _rpmspec_args, so sidecar files still resolve correctly.
effective_spec = _maybe_rewrite_spec(spec_path, scratch_dir, name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this doesn't take the new effective_spec directory to use for spec_dir, is that an issue, or intended? (ie we want a different spec, but to still read everything as-if the spec was in its old spot).

Copy link
Copy Markdown
Contributor Author

@liunan-ms liunan-ms Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in efd4560

name string
version string
release string
buildArch string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the arch logic is mostly untested end-to-end. Since a lot of it lives in the python side might be easiest to test here via scenario test.

Copy link
Copy Markdown
Contributor Author

@liunan-ms liunan-ms Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in efd4560

content = src.read()

for find, replace in rewrites:
content = content.replace(find, replace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had to do the same find/replace pattern when doing similar things, I haven't found a better way either, but we should consider erroring if the string isn't found anymore because the upstream spec changed or got overlays added, might make it easier to fix than a random parse error thats hard to diagnose.

Comment thread internal/rpm/specquery.go
// Subpackages lists the binary package names the spec produces, in the
// order rpmspec reports them. Empty when not queried (e.g., the
// per-component buildenv path that only requests --srpm output).
Subpackages []string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Are we worried about cases were a package produces no outputs? Maybe make it nil-able to differentiate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I don't think we need to. The batched query path always populates Subpackages (even to an empty slice) on a successful query, so len == 0 already means "queried, spec produces no built binary RPMs." The only caller that leaves it nil is the legacy per-component QuerySpec, which doesn't run the subpackage query at all. Happy to revisit if a real caller needs the distinction.

// (e.g. some items succeeded, some failed). Render what we have
// before propagating so callers see the partial output and the
// process still exits non-zero.
if results != nil && !isNilValue(results) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe update the comments for the various CmdFuncType public functions to indicate this behavior. It may be a bit surprising that a failed command still outputs results.

Copy link
Copy Markdown
Contributor Author

@liunan-ms liunan-ms Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in efd4560

arch,
)
# Insert --builtrpms right after `-q` so it associates with the query.
bin_args.insert(2, "--builtrpms")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use something like bin_args.index("-q") + 1 to find the -q so its stable if args move around.

Copy link
Copy Markdown
Contributor Author

@liunan-ms liunan-ms Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in efd4560

return _ARCH_ALIASES.get(token, token)


def _is_arch_excluded(arch, exclusive_arch, exclude_arch):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50/50 if this should live in the golang side somehow, and get passed in as a table. I'm not aware of any other go side features that will care about arch parsing, so I think its fine here, but that might change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, keeping it here for now since query_process.py is the only consumer. If a Go-side feature ever needs the same canonicalization, the natural refactor is to define the table once in Go (e.g. under internal/utils/qemu) and ship it down via the inputs JSON alongside arch. I'll leave a TODO if you'd like.

- consolidate spec rel-path validation into fileutils.ValidateRelPath
- render/update: surface partial results via runFuncInternal on error;
  widen filterDisplayResults to include errored entries; restore
  --fail-on-error on render as opt-in
- document partial-results-on-error contract on RunFunc* / CmdFuncType
- query_process.py: pin _sourcedir to the original spec dir when
  _maybe_rewrite_spec produces a scratch copy; raise loudly if a
  spec-rewrite find string no longer matches; look up -q by index
  when inserting --builtrpms; add TODO for moving _ARCH_ALIASES to Go
- add scenario test for --arch ExclusiveArch/ExcludeArch handling
  (TestSpec gains WithExclusiveArch / WithExcludeArch)
Copilot AI review requested due to automatic review settings June 2, 2026 00:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Comment on lines +173 to +176
queryResults, err := mockProcessor.BatchQuerySpecs(
env, env, renderedSpecsDir, scratchDir, archStr,
inputs, env.FS(), env.CPUBoundConcurrency(),
)
Comment thread internal/rpm/specquery.go
Comment on lines +177 to +179
if after, ok := strings.CutPrefix(trimmed, "subpkg="); ok && after != "" {
result = append(result, after)
}
Comment on lines +389 to +392
# Insert --builtrpms right after `-q` so it associates with the query.
# Look up `-q` rather than hard-coding the index so this stays correct if
# _rpmspec_args ever reorders its preamble.
bin_args.insert(bin_args.index("-q") + 1, "--builtrpms")
Comment on lines +144 to 146
if len(inputs) == 0 {
return nil, fmt.Errorf("no components have a rendered spec on disk; skipped %d", skipped)
}
Comment on lines +155 to +159
if results != nil && !isNilValue(results) {
if reportErr := reportResults(env, results); reportErr != nil {
return errors.Join(err, reportErr)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants