fix: enforce policy-based access control on artifact downloads#7009
fix: enforce policy-based access control on artifact downloads#7009ycombinator wants to merge 12 commits into
Conversation
|
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
This comment has been minimized.
This comment has been minimized.
| if !ok { | ||
| return nil, nil | ||
| } | ||
| data, err := json.Marshal(raw) |
There was a problem hiding this comment.
Marshalling to then unmarshal feels a bit weird, there's no way to just unmarshal this? Maybe an intermediate type that has artifact_manifest as json.RawMessage instead of an any?
There was a problem hiding this comment.
Removed the marshal-then-unmarshal; agree, it was awkward. Just went with directly accessing map fields, with type assertions for safety: 535b14f
2e1103d to
535b14f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
michel-laterman
left a comment
There was a problem hiding this comment.
lgtm, just have nitpicks
|
|
||
| func (tester *ClientAPITester) TestArtifact() { | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Minute) | ||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) |
There was a problem hiding this comment.
(nitpick) we should use tester.T().Context() instead of background
|
|
||
| func (tester *ClientAPITester20230601) TestArtifact() { | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Minute) | ||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) |
There was a problem hiding this comment.
(nitpick) we should use tester.T().Context() instead of background
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The artifact download endpoint (/api/fleet/artifacts/{id}/{sha256})
previously only validated the agent's API key but never checked whether
the requested artifact belonged to the agent's assigned policy. This
allowed an agent enrolled under one policy to download artifacts from
a different policy if it knew the artifact ID and SHA256 hash.
Add authorizeArtifact implementation that fetches the agent's policy
from the in-memory policy monitor cache and verifies the requested
artifact appears in the policy's artifact_manifest before serving it.
Returns 403 Forbidden if the artifact is not in the agent's policy.
Resolves: elastic/security#8396
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Defines model.ArtifactManifest and model.ManifestEntry structs so policyHasArtifact no longer navigates untyped map[string]any chains. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
schema.go is code-generated and gets overwritten by mage generate. Moving ArtifactManifest and ManifestEntry to ext.go keeps them stable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ArtifactManifest and ManifestEntry are not ES document types and only exist to support parsing within handleArtifacts.go, so they belong there rather than in the model package. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agents enrolled under dummy-policy cannot download Elastic Defend artifacts because that policy has no artifact_manifest. Enroll the test agent under security-policy (which has the Elastic Defend integration) instead. Add FleetPolicyHasArtifact scaffold helper that polls .fleet-policies until the policy document references the artifact, ensuring fleet-server's policy monitor cache is up-to-date before the download attempt. Also retry the download on 403 to tolerate any remaining cache propagation lag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…checks PolicyData.Inputs is already decoded as []map[string]any, so marshaling back to JSON just to unmarshal again is unnecessary. Use type assertions directly on the decoded map in both policyHasArtifact (production) and policyInputHasArtifact (e2e scaffold), removing the artifactManifest and manifestEntry types along with parseArtifactManifest. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntext The ES global checkpoint monitor can hold up to 4 minutes before the policy cache refreshes. On slow CI the setup steps (FleetHasArtifacts + FleetPolicyHasArtifact) could exhaust the 3-minute budget, leaving the retry loop to fail with a misleading "context deadline exceeded" from the HTTP call. Raise the budget to 5 minutes and add an explicit ctx.Err() check at the top of the retry loop so expiry surfaces a clear message. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cefb353 to
ea82e65
Compare
This comment has been minimized.
This comment has been minimized.
TL;DRThe Buildkite E2E step failed due to the entire Remediation
Investigation detailsRoot CauseThe failure is a test timeout at the suite/package level:
Because the captured log is truncated to goroutine dump tail + package FAIL, it does not include the specific test name that hung. Evidence
Verification
Follow-upI could not deduplicate against prior detective comments on this PR because PR read endpoints were blocked by integrity policy in this runtime. If this exact timeout diagnosis was already posted, please treat this as confirmation. Related flaky-test references (not an exact match to this log due missing test name): Note 🔒 Integrity filter blocked 3 itemsThe following items were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | noneWhat is this? | From workflow: PR Buildkite Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
| m.mut.Unlock() | ||
|
|
||
| if !ok { | ||
| return nil, fmt.Errorf("policy %s not found", policyID) |
There was a problem hiding this comment.
Should we return ErrUnauthorizedArtifact here? As it is now, it would return 500 (default in error.go) but I think 403 is more appropriate. Returning 404 would leak whether the policy ID exists.
| } | ||
| } | ||
|
|
||
| func policyInputHasArtifact(input map[string]any, identifier, decodedSha256 string) bool { |
There was a problem hiding this comment.
nit: this can be re-used from policyHasArtifact in handleArtifacts.go if we define it as:
func policyHasArtifact(pd *model.PolicyData, id, sha2 string) bool {
for _, input := range pd.Inputs {
if inputHasArtifact(input, id, sha2) {
return true
}
}
return false
}
func inputHasArtifact(input map[string]any, id, sha2 string) bool {
...
}| if !ok { | ||
| return false | ||
| } | ||
| sha, _ := entry["decoded_sha256"].(string) |
There was a problem hiding this comment.
nit: we can read decoded_sha256 from https://github.com/elastic/fleet-server/blob/main/internal/pkg/dl/constants.go#L68
It can be worth to also add artifact_manifest and artifacts to this constants.go file.
What is the problem this PR solves?
The artifact download endpoint (
/api/fleet/artifacts/{id}/{sha256}) only validates the agent's API key but never checks whether the requested artifact belongs to the agent's assigned policy. This means an agent enrolled under one policy can download artifacts belonging to a different policy if it knows the artifact ID and SHA256 hash. For example, an agent enrolled under a policy with no integrations can retrieve Elastic Defend trust lists, exception lists, and other security artifacts from another policy.How does this PR solve the problem?
Implements the
authorizeArtifact()function (previously a no-op that returnednil) to enforce policy-based access control:GetPolicy(ctx, policyID)method to thepolicy.Monitorinterface that returns the cached policy for a given ID (reloads from ES on cache miss).authorizeArtifact, fetches the agent's policy via the monitor usingagent.AgentPolicyIDand verifies that the requested artifact (identifier+decoded_sha256) appears in the policy'sinputs[].artifact_manifest.artifacts.ErrUnauthorizedArtifact) if the artifact is not listed in the agent's assigned policy.How to test this PR locally
Design Checklist
Checklist
./changelog/fragmentsusing the changelog tool