Skip to content

test: add framework label catalog and component/image test mappings#17443

Draft
bhagyapathak wants to merge 1 commit into
4.0from
bhagya/test-metadata
Draft

test: add framework label catalog and component/image test mappings#17443
bhagyapathak wants to merge 1 commit into
4.0from
bhagya/test-metadata

Conversation

@bhagyapathak
Copy link
Copy Markdown

@bhagyapathak bhagyapathak commented May 26, 2026

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./LICENSES-AND-NOTICES/SPECS/data/licenses.json, ./LICENSES-AND-NOTICES/SPECS/LICENSES-MAP.md, ./LICENSES-AND-NOTICES/SPECS/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • Ready to merge

Summary

It introduces a declarative, in-tree test-selection layer so CI can decide what to run from data living next to the image being tested, instead of from out-of-tree CI YAML or hard-coded scripts.

Change Log
  • Add base/tests/framework-labels.toml as the test-label catalog (tmt/lisa/openqa/pytest bindings, per-label retry/timeout).
  • Wire it up via [images..test-workflows] tier maps in base/images/images.toml and test_labels = [...] on components in base/comps/components.toml (+ kernel, kernel-headers, grub2, systemd overrides).
Does this affect the toolchain?

NO

Test Methodology

./azldev -C ~/azurelinux image labels vm-base pr_validation

lisa_priority0
lisa_smoke

./azldev -C ~/azurelinux image labels vm-base nightly_validation

lisa_ltp
lisa_perf
lisa_priority1
lisa_priority2

@reubeno
Copy link
Copy Markdown
Member

reubeno commented May 29, 2026

Hi @bhagyapathak --

First off, thanks for putting this PR together. The format made it much easier to understand the pieces you're proposing, and concretely pushed things forward to identifying the relationship between components/images and tests. You've also
introduced a new version of labels as a TOML-local grouping structure that can be used for, say, components to indirectly associate with a heterogenous collection of tests. I very much like that idea and think it helps us move things forward.

I threw together reubeno@e3f1e95 as a proposed update to your PR. It's mostly renaming/re-shaping your proposed syntax to more strongly align with the existing patterns / principles in the TOML file -- most notably, the concept of "groups" and a desire to abstractly model a "test", independent from what it really maps to natively in the framework it uses. I also want to acknowledge that I have separated out some operational choices and policy that I think either belongs elsewhere or needs to be something we layer on later after we get the basics in place.

Concepts kept (with different naming/syntax)

  • A named, reusable test record. Your *_labels entries become a single [tests.X] namespace. Instead of encoding the framework in the table-name prefix (tmt_, pytest_, …), each test carries an explicit type plus a framework
    subtable ([tests.X.pytest], [tests.X.tmt]). Same idea, one namespace with subtyping.

  • Framework-native filters as structured config. Your pytest_markers / pytest_files / pytest_args survive as structured fields under [tests.X.pytest] (test-paths, extra-args, etc.). The TMT filter/source concept survives under
    [tests.X.tmt].

  • Grouping. Your labels-as-buckets idea becomes a first-class [test-groups.X] — a named bundle of tests, mirroring how [component-groups.X] and [package-groups.X] already work in the repo. This is where the "run this
    whole set" intent lives. I'd also note that the existing component group concept allows another way to identify a set of components that inherit a common set of properties, which could include associated tests too.

  • Associating tests with the things they test. Your test_labels = [...] on components and the per-image label lists both become one uniform form: tests.tests = [{ name = "…" } | { group = "…" }], declared on the image or component. We extend this so components (not just images) can carry associations and join shared groups, and make sure that there's the same syntax for doing this between components and images.

  • Capability gating. requires_capabilities is genuinely declarative, so I think it makes sense to keep. I'd be inclined to rename it required-capabilities to match the language and casing we use elsewhere in the TOML files. It's also something that would now live on a test definition, in a common (non-framework-specific) property.

Properties deferred (and why)

The one structural change worth calling out: I pulled workflow/execution policy out of the test definitions. In the PR, there's workflow / scenario-oriented policy (e.g., pr_validation vs. nightly_validation) as well as operational toggles (e.g., the [frameworks] enable/disable options).

To be clear — I agree that policy is needed. The Control Tower needs to understand (and/or decide) when to run certain tests, whether certain associated tests are appropriate to request in the current workflow, etc. These workflow-sensitive policies should be considered a separate concern from the test describing itself. So the goal for [tests.X] is for a test to declare what it is and how to run it as purely as possible, and for the when/how-often policy to be layered on separately, in meta-policy for qualification / promotion scenarios.

Small additions

  • kind = ["functional", "performance"] — a closed enum on a test that allows the test to advertise the role it plays.
  • long-running = true — a hint for tests that may run for hours. I think a pure estimated number could get messy over time, and anticipate that for that level of precision we'll want to leverage historical execution data. But for now, this is
    a placeholder to indicate that certain tests are not ones that can finish in, say, < 30 mins. I think we'll need to refine and decide what this really means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants