Feat/breaking change rework ranking#58
Open
Breee wants to merge 35 commits into
Open
Conversation
… (Issue 1: CRD types) Implements the first issue of the three-stage discovery pipeline redesign documented in docs/decisions/13-discovery-signals-ranking.md. ## Breaking Changes - Removed: spec.sources[], DiscoverySource, PrometheusSource (API type), RegistrySource, status.sourceCount - Added: spec.queries[], spec.signals[], spec.ranking (new three-stage pipeline) - DiscoveredImage: removed score/source fields, added rank/finalScore/selected/ signals/ranking breakdown fields ## New API Types Query stage: - DiscoveryQuery (prometheus | loki) - DiscoveryPrometheusQuery, DiscoveryLokiQuery, LokiParser Signal stage (4 types): - aggregate, timeWeightedAggregate, windowAggregate, eventPullTime Ranking stage (3 strategies): - signal, weightedSum (minMax normalized), modelExposure (cold-node exposure) Status: - QueryResult[], SignalResult[] — per-query/signal observability - Rich DiscoveredImage with signals[] and ranking breakdown ## Other Changes - Regenerated deepcopy and CRD manifests - Stubbed controller: sets Ready=False/NotImplemented until Issues 2-10 land - Removed internal/discovery/registry.go (registry source retired) - Removed test/e2e/discovery-aggregation/ and discovery-registry/ (retired) - Updated all e2e tests to new schema, assert NotImplemented condition - Rewrote docs/content/docs/discovery.md with full pipeline explanation - Regenerated AI docs (knowledge.yaml, llms.txt, llms-full.txt) Closes #55
…gistry datasource - Add DiscoveryQueryTypeRegistry + DiscoveryRegistryQuery to API types - Restore internal/discovery/registry.go and registry_test.go - Add internal/discovery/engine.go: full 3-stage pipeline execution (query → signal → ranking) - Prometheus instant/range, registry queries - aggregate, timeWeightedAggregate, windowAggregate signals - signal, weightedSum, modelExposure ranking strategies - Add internal/discovery/engine_test.go: tests for all pipeline stages - Add FetchRaw() to PrometheusSource for timestamp-preserving data access - Replace controller stub (NotImplemented) with real pipeline execution - Update e2e tests: assert real behavior (Synced/DNSError) instead of NotImplemented - Add discovery-registry e2e test suite - Regenerate deepcopy and CRD manifests All unit tests pass, linter clean (0 issues).
…discovery Deploy a single-binary Loki into the e2e-infra namespace and seed it with kubelet-style image-pull event log lines (Pulling/Pulled/Failed/already present) so DiscoveryPolicy loki queries with the kubernetesEvents parser and the eventPullTime signal can be exercised end-to-end. Wired into hack/e2e-infra/setup.sh and the Tiltfile alongside the existing Prometheus and registry infrastructure.
Add a DiscoveryPolicy e2e suite that runs a Loki range query with the kubernetesEvents parser and derives p50 cold-pull-time and failure-count eventPullTime signals from the seeded image-pull events, asserting the pipeline reports Ready=Synced and discovers the expected images. Also refresh the e2e README scenario table (discovery, discovery-loki, discovery-registry).
The kubelet readiness probe against Loki's /ready was flaky during ring stabilization (the probe's 1s timeout was exceeded and /ready returns 503 until the ingester settles), leaving the deployment stuck as not-available. The existing Prometheus and registry manifests use no readiness probe; the seed job already polls /ready before pushing and consumers retry, so gate readiness the same way for consistency and reliability.
Also assert test/tools:v1 (the third seeded image) appears in status.discoveredImages so the assertions cover the full seed dataset.
The readiness probe was dropped in the previous commit because the 1s timeout was too short for ring stabilization. Without any probe, kubectl wait --for=condition=available succeeds as soon as the container starts (before Loki's HTTP server accepts requests), so the seed job could run against a not-yet-ready Loki. Re-add the probe with a longer 5s timeout and 15s initial delay, giving Loki up to ~105s to pass before the Deployment is marked Available and the setup.sh seed step begins. Also: - Remove stale 02-assert-notimplemented.yaml (controller no longer returns NotImplemented; file was unused by any chainsaw-test.yaml) - Fix test/e2e/README.md: wrong make target, wrong scenario names, missing scenarios (cachedimageset-discovery, discovery-failure) - Update Makefile e2e-infra comment and CI step name to include Loki
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant enhancements to the DiscoveryPolicy API and the development environment, focusing on supporting richer query and ranking capabilities, as well as improving E2E infrastructure.
The main changes include a comprehensive refactor of DeepCopy methods to support new types, the addition of Loki (log aggregation) resources for E2E and local development, and the introduction of a feature spec for a future DiscoveryPolicy UI.
this is for our plans in #55