feat: Integrate Azure Offensive Skills and Dynamic AI Grounding#137
Conversation
…adata standardization
There was a problem hiding this comment.
Requesting changes for now.
The overall direction is strong. Adding Azure-focused offensive skills to the RAG pipeline, standardizing KeyVault metadata, and introducing maintenance scripts for grounding are all useful improvements.
I re-checked the review branch against upstream/dev. The PR currently changes 32 files, with the main integration risk concentrated in the RAG grounding flow, vectorstore rebuild behavior, and test coverage.
Main issue: dynamic grounding in ai/loader.py uses raw substring matching. Because rule_mapping.json includes short tokens like vm, ade, shor, https, and iam, unrelated skills can accidentally match unrelated categories. I confirmed this locally: offensive-idor matches PostQuantum because “shor” appears inside “shortest”, and offensive-waf-bypass/offensive-sqli can pull Compute through short tokens like vm/ade. This can inject irrelevant scanner rules into the RAG context.
Suggested remediation: replace raw substring matching with boundary-aware matching, frontmatter/tag-based matching, or explicit keyword-to-rule-ID / per-skill rule mappings. At minimum, use regex word boundaries or tokenized matching so short keywords do not match inside unrelated words.
Second issue: the vectorstore rebuild flow deletes the existing collection before adding the new chunks. If embedding fails midway, the existing working vectorstore may be lost or partially rebuilt.
Suggested remediation: build into a temporary collection first and only swap to the new collection after a successful build, or otherwise preserve the previous collection until the rebuild completes.
Third issue: the PR says grounding was verified via scripts/test_skill_loader.py, but that script is not present. Also, tests/test_ai_hallucination_guard.py is not enforced by CI because the workflow only runs pytest tests/test_rules_*.py, and the hallucination tests skip unless ai/vectorstore exists.
Suggested remediation: add the missing verification script or remove the claim, and wire deterministic grounding tests into CI. A good approach would be to mock retrieve() or use a small fixture vectorstore so the tests run reliably without requiring a local generated ai/vectorstore directory. Keep LLM/API-key tests optional, but make the grounding logic itself testable in CI.
Small docs issue: scripts/README.md documents scripts/import_azure_skills.py, but that file is not present.
Suggested remediation: add the script or update the docs to only reference scripts included in this PR.
One thing I checked: the source_meta API response change does not look obviously breaking at first glance because the frontend ChatMessage component reads src.id and treats src.resource as optional. Still, since this changes the AI source response shape, please confirm the AI chat/source badge UI was tested.
Validation I ran:
- python3 scripts/audit_ai_grounding.py passed
- python3 -m py_compile passed for the changed AI/script/test files
- .venv/bin/pytest tests/test_rules_keyvault.py tests/test_ai_hallucination_guard.py -q returned 3 passed, 3 skipped
Once grounding matching is made precise, vectorstore rebuild is safer, and the AI integrity tests are actually enforced in CI, this should be in good shape.
TFT444
left a comment
There was a problem hiding this comment.
Hey @ritiksah141,
I reviewed this PR. CI is green and the direction is solid. parthrohit22 already left a detailed review on June 11, and I agree with all three blocking concerns.
The grounding substring matching in ai/loader.py is too broad right now, short tokens like vm, shor, and ade are matching inside unrelated words which can inject wrong skills into the RAG context.
The vectorstore rebuild is also risky because it deletes the existing collection before the new one is ready, so a failed embed midway leaves the system with no working vectorstore. And the testing claims reference scripts/test_skill_loader.py which is not in the PR, and the hallucination guard tests are not wired into CI so they are not actually enforced.
Please address these three things, thenre-request review and we can move it forward.
|
@parthrohit22 , @TFT444 PR is ready for review
|
TFT444
left a comment
There was a problem hiding this comment.
Hey @ritiksah141 ,
I
checked the updates. All three blocking concerns are properly addressed. Word boundary matching is correct, the atomic temp collection swap is the right approach for the vectorstore rebuild, and wiring the hallucination guard into CI with a mocked case is exactly what was needed. Docs are cleaned up too. CI is green on the latest commit.
Approving this, ready for merge.
parthrohit22
left a comment
There was a problem hiding this comment.
I am approving this PR for merge as all the requested issues are acknowledged and solved. great work.
What does this PR do?
Integrates 6 new Azure-specific offensive security skills into the RAG pipeline with a dynamic grounding engine that automatically links methodologies to automated scanner rules.
Type of change
Rule details
Testing
Related issue
Closes # (Referencing the feature request issue created earlier)
Checklist
Summary of Key Changes: