Skip to content

feat: Integrate Azure Offensive Skills and Dynamic AI Grounding#137

Merged
Vishnu2707 merged 6 commits into
openshield-org:devfrom
ritiksah141:feat/integrate-azure-offensive-skill
Jun 21, 2026
Merged

feat: Integrate Azure Offensive Skills and Dynamic AI Grounding#137
Vishnu2707 merged 6 commits into
openshield-org:devfrom
ritiksah141:feat/integrate-azure-offensive-skill

Conversation

@ritiksah141

Copy link
Copy Markdown
Collaborator

What does this PR do?
Integrates 6 new Azure-specific offensive security skills into the RAG pipeline with a dynamic grounding engine that automatically links methodologies to automated scanner rules.

Type of change

  • New scan rule (Metadata updates for KV rules)
  • Documentation (Updated AI README and added scripts documentation)
  • API endpoint (Updated AI service layer for better grounding)
  • Documentation (Offensive Skills)

Rule details

  • Rule IDs: AZ-KV-001 through AZ-KV-005 (updated metadata)
  • Category: Key Vault, Identity, Network, Database, Storage
  • Frameworks mapped: CIS 8.x, NIST CSF, ISO 27001, SOC 2

Testing

  • Dynamic Grounding: Verified ai/loader.py correctly injects rules into skills via scripts/test_skill_loader.py (verified logic).
  • AI Integrity: tests/test_ai_hallucination_guard.py passes, ensuring the AI refuses non-Azure content and made-up rule IDs.
  • Metadata Consistency: Verified all Key Vault rules now include location and resource_group in metadata.
  • Mock Accuracy: Updated MockAzureClient to support diagnostic settings and certificates for Key Vault unit tests.

Related issue
Closes # (Referencing the feature request issue created earlier)

Checklist

  • My code follows the rule template in CONTRIBUTING.md
  • I have updated rule categories for consistency (e.g., KeyVault instead of Key Vault)
  • I added a hallucination guard regression test
  • My branch name follows the convention: feat/integrate-azure-offensive-skill

Summary of Key Changes:

  1. Dynamic Grounding: Implemented a keyword-based injection system in ai/loader.py that connects offensive methodologies to automated detections.
  2. Azure Purity: Imported and scrubbed 6 offensive skills to ensure 100% Azure focus (no AWS/GCP leakage).
  3. Audit Scripts: Added audit_ai_grounding.py and generate_rule_keywords.py to help maintain the AI's "Brain" as new rules are added.
  4. Key Vault Hardening: Standardized metadata across all Key Vault scanner rules to improve regional reporting accuracy.

@ritiksah141 ritiksah141 self-assigned this Jun 8, 2026

@parthrohit22 parthrohit22 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for now.

The overall direction is strong. Adding Azure-focused offensive skills to the RAG pipeline, standardizing KeyVault metadata, and introducing maintenance scripts for grounding are all useful improvements.

I re-checked the review branch against upstream/dev. The PR currently changes 32 files, with the main integration risk concentrated in the RAG grounding flow, vectorstore rebuild behavior, and test coverage.

Main issue: dynamic grounding in ai/loader.py uses raw substring matching. Because rule_mapping.json includes short tokens like vm, ade, shor, https, and iam, unrelated skills can accidentally match unrelated categories. I confirmed this locally: offensive-idor matches PostQuantum because “shor” appears inside “shortest”, and offensive-waf-bypass/offensive-sqli can pull Compute through short tokens like vm/ade. This can inject irrelevant scanner rules into the RAG context.

Suggested remediation: replace raw substring matching with boundary-aware matching, frontmatter/tag-based matching, or explicit keyword-to-rule-ID / per-skill rule mappings. At minimum, use regex word boundaries or tokenized matching so short keywords do not match inside unrelated words.

Second issue: the vectorstore rebuild flow deletes the existing collection before adding the new chunks. If embedding fails midway, the existing working vectorstore may be lost or partially rebuilt.

Suggested remediation: build into a temporary collection first and only swap to the new collection after a successful build, or otherwise preserve the previous collection until the rebuild completes.

Third issue: the PR says grounding was verified via scripts/test_skill_loader.py, but that script is not present. Also, tests/test_ai_hallucination_guard.py is not enforced by CI because the workflow only runs pytest tests/test_rules_*.py, and the hallucination tests skip unless ai/vectorstore exists.

Suggested remediation: add the missing verification script or remove the claim, and wire deterministic grounding tests into CI. A good approach would be to mock retrieve() or use a small fixture vectorstore so the tests run reliably without requiring a local generated ai/vectorstore directory. Keep LLM/API-key tests optional, but make the grounding logic itself testable in CI.

Small docs issue: scripts/README.md documents scripts/import_azure_skills.py, but that file is not present.

Suggested remediation: add the script or update the docs to only reference scripts included in this PR.

One thing I checked: the source_meta API response change does not look obviously breaking at first glance because the frontend ChatMessage component reads src.id and treats src.resource as optional. Still, since this changes the AI source response shape, please confirm the AI chat/source badge UI was tested.

Validation I ran:

  • python3 scripts/audit_ai_grounding.py passed
  • python3 -m py_compile passed for the changed AI/script/test files
  • .venv/bin/pytest tests/test_rules_keyvault.py tests/test_ai_hallucination_guard.py -q returned 3 passed, 3 skipped

Once grounding matching is made precise, vectorstore rebuild is safer, and the AI integrity tests are actually enforced in CI, this should be in good shape.

@TFT444 TFT444 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ritiksah141,

I reviewed this PR. CI is green and the direction is solid. parthrohit22 already left a detailed review on June 11, and I agree with all three blocking concerns.

The grounding substring matching in ai/loader.py is too broad right now, short tokens like vm, shor, and ade are matching inside unrelated words which can inject wrong skills into the RAG context.

The vectorstore rebuild is also risky because it deletes the existing collection before the new one is ready, so a failed embed midway leaves the system with no working vectorstore. And the testing claims reference scripts/test_skill_loader.py which is not in the PR, and the hallucination guard tests are not wired into CI so they are not actually enforced.

Please address these three things, thenre-request review and we can move it forward.

@ritiksah141

Copy link
Copy Markdown
Collaborator Author

@parthrohit22 , @TFT444 PR is ready for review
Here is the summary of the applied fixes:

  1. Boundary-Aware Grounding Matching (ai/loader.py)

    • Fix: Replaced raw substring matching (keyword in content) with regex word boundaries (\bkeyword\b).
    • Reason: Short keywords like vm, iam, or shor were matching inside unrelated words (e.g., devmac, parliament, or shortest). This caused the RAG engine to inject
      irrelevant scanner rules into the AI context, degrading response quality and increasing token noise. Word boundaries ensure tokens match only as standalone semantic
      concepts.
  2. Atomic Vectorstore Rebuild (ai/embed.py)

  • Fix: Implemented a "Temporary Collection + Atomic Swap" flow. Data is embedded into openshield_temp, and the main openshield collection is only replaced after a
    successful build.
  • Reason: The previous implementation deleted the production collection before embedding new data. If the embedding process failed midway (e.g., network timeout or
    OOM), the system was left with a corrupted or empty knowledge base. The new flow ensures the existing vectorstore remains active until the new one is verified.
  1. Deterministic CI Enforcement (.github/workflows/ci.yml)
  • Fix: Expanded the CI pytest command to include tests/test_ai_hallucination_guard.py. Added a mocked test case to verify grounding logic without requiring external
    AI APIs.
  • Reason: Critical AI integrity guards were previously excluded from CI, meaning regressions in grounding precision or Azure "purity" could go unnoticed. Wiring these
    into CI makes AI safety a mandatory merge requirement.
  1. Source Metadata Enrichment (ai/retriever.py)
  • Fix: Added the resource field (mapping to rule_name or control_name) to the source_meta API response.
  • Reason: The frontend Chat UI uses this field to render descriptive badges for AI sources. Without it, the UI only displayed Rule IDs (e.g., AZ-KV-002), making it
    harder for users to understand the context of the AI's grounding without manual lookup.
  1. Documentation and Script Alignment (scripts/README.md)
  • Fix: Removed documentation for import_azure_skills.py and other missing verification scripts.
  • Reason: Maintaining documentation for scripts not included in the PR creates technical debt and confuses contributors. The documentation now strictly reflects the
    actual file system state.

TFT444
TFT444 previously approved these changes Jun 14, 2026

@TFT444 TFT444 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ritiksah141 ,
I
checked the updates. All three blocking concerns are properly addressed. Word boundary matching is correct, the atomic temp collection swap is the right approach for the vectorstore rebuild, and wiring the hallucination guard into CI with a mocked case is exactly what was needed. Docs are cleaned up too. CI is green on the latest commit.

Approving this, ready for merge.

parthrohit22
parthrohit22 previously approved these changes Jun 14, 2026

@parthrohit22 parthrohit22 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this PR for merge as all the requested issues are acknowledged and solved. great work.

@Vishnu2707 Vishnu2707 dismissed stale reviews from parthrohit22 and TFT444 via 45c1299 June 21, 2026 23:45
@Vishnu2707 Vishnu2707 merged commit e0512bc into openshield-org:dev Jun 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants