Feat/breaking change rework ranking by Breee · Pull Request #58 · corewire/drop

Breee · 2026-06-28T11:47:27Z

This pull request introduces significant enhancements to the DiscoveryPolicy API and the development environment, focusing on supporting richer query and ranking capabilities, as well as improving E2E infrastructure.

The main changes include a comprehensive refactor of DeepCopy methods to support new types, the addition of Loki (log aggregation) resources for E2E and local development, and the introduction of a feature spec for a future DiscoveryPolicy UI.

this is for our plans in #55

… (Issue 1: CRD types) Implements the first issue of the three-stage discovery pipeline redesign documented in docs/decisions/13-discovery-signals-ranking.md. ## Breaking Changes - Removed: spec.sources[], DiscoverySource, PrometheusSource (API type), RegistrySource, status.sourceCount - Added: spec.queries[], spec.signals[], spec.ranking (new three-stage pipeline) - DiscoveredImage: removed score/source fields, added rank/finalScore/selected/ signals/ranking breakdown fields ## New API Types Query stage: - DiscoveryQuery (prometheus | loki) - DiscoveryPrometheusQuery, DiscoveryLokiQuery, LokiParser Signal stage (4 types): - aggregate, timeWeightedAggregate, windowAggregate, eventPullTime Ranking stage (3 strategies): - signal, weightedSum (minMax normalized), modelExposure (cold-node exposure) Status: - QueryResult[], SignalResult[] — per-query/signal observability - Rich DiscoveredImage with signals[] and ranking breakdown ## Other Changes - Regenerated deepcopy and CRD manifests - Stubbed controller: sets Ready=False/NotImplemented until Issues 2-10 land - Removed internal/discovery/registry.go (registry source retired) - Removed test/e2e/discovery-aggregation/ and discovery-registry/ (retired) - Updated all e2e tests to new schema, assert NotImplemented condition - Rewrote docs/content/docs/discovery.md with full pipeline explanation - Regenerated AI docs (knowledge.yaml, llms.txt, llms-full.txt) Closes #55

…gistry datasource - Add DiscoveryQueryTypeRegistry + DiscoveryRegistryQuery to API types - Restore internal/discovery/registry.go and registry_test.go - Add internal/discovery/engine.go: full 3-stage pipeline execution (query → signal → ranking) - Prometheus instant/range, registry queries - aggregate, timeWeightedAggregate, windowAggregate signals - signal, weightedSum, modelExposure ranking strategies - Add internal/discovery/engine_test.go: tests for all pipeline stages - Add FetchRaw() to PrometheusSource for timestamp-preserving data access - Replace controller stub (NotImplemented) with real pipeline execution - Update e2e tests: assert real behavior (Synced/DNSError) instead of NotImplemented - Add discovery-registry e2e test suite - Regenerate deepcopy and CRD manifests All unit tests pass, linter clean (0 issues).

…nal derivation

…discovery Deploy a single-binary Loki into the e2e-infra namespace and seed it with kubelet-style image-pull event log lines (Pulling/Pulled/Failed/already present) so DiscoveryPolicy loki queries with the kubernetesEvents parser and the eventPullTime signal can be exercised end-to-end. Wired into hack/e2e-infra/setup.sh and the Tiltfile alongside the existing Prometheus and registry infrastructure.

Add a DiscoveryPolicy e2e suite that runs a Loki range query with the kubernetesEvents parser and derives p50 cold-pull-time and failure-count eventPullTime signals from the seeded image-pull events, asserting the pipeline reports Ready=Synced and discovers the expected images. Also refresh the e2e README scenario table (discovery, discovery-loki, discovery-registry).

The kubelet readiness probe against Loki's /ready was flaky during ring stabilization (the probe's 1s timeout was exceeded and /ready returns 503 until the ingester settles), leaving the deployment stuck as not-available. The existing Prometheus and registry manifests use no readiness probe; the seed job already polls /ready before pushing and consumers retry, so gate readiness the same way for consistency and reliability.

Also assert test/tools:v1 (the third seeded image) appears in status.discoveredImages so the assertions cover the full seed dataset.

The readiness probe was dropped in the previous commit because the 1s timeout was too short for ring stabilization. Without any probe, kubectl wait --for=condition=available succeeds as soon as the container starts (before Loki's HTTP server accepts requests), so the seed job could run against a not-yet-ready Loki. Re-add the probe with a longer 5s timeout and 15s initial delay, giving Loki up to ~105s to pass before the Deployment is marked Available and the setup.sh seed step begins. Also: - Remove stale 02-assert-notimplemented.yaml (controller no longer returns NotImplemented; file was unused by any chainsaw-test.yaml) - Fix test/e2e/README.md: wrong make target, wrong scenario names, missing scenarios (cachedimageset-discovery, discovery-failure) - Update Makefile e2e-infra comment and CI step name to include Loki

Copilot AI and others added 30 commits June 27, 2026 11:19

Initial plan

6705ccb

feat(discovery): implement Loki query execution and eventPullTime sig…

a00b969

…nal derivation

test(e2e): assert all three seeded images are discovered from Loki

769a38d

Also assert test/tools:v1 (the third seeded image) appears in status.discoveredImages so the assertions cover the full seed dataset.

Apply remaining changes

0c52eff

fix(devenv): fix tiltfile.

19db893

feat(devenv): devsample up to date with current featureset

260dfbe

feat(crds): slim the crd status to not pollute etcd

480a13b

feat(tests): rework tests

ecf9f63

uidoc

35195ad

doocs

30cce4a

strats

5ca7d74

refactor fields

ec0006d

update samples

d4b68d8

update tests

7a53ed0

update make

c6e77ef

gen

43e1155

ignore

bc3981d

dashboard

83193d3

alloy conform events

d20ae80

refactor loki parsing

7e40ea1

tests

83c13c6

infra

7c79325

docs

71afb4b

docs

0730a49

Breee added 5 commits June 29, 2026 12:14

tilt

22ee5e3

registry discovery / docs

e8154c8

loki pull

4d5fcc6

twsts, gen

33ae6dc

tests

5b07808

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/breaking change rework ranking#58

Feat/breaking change rework ranking#58
Breee wants to merge 35 commits into
mainfrom
feat/breaking-change-rework-ranking

Breee commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Breee commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants