added RFC on how to create a living knowledge base of owasp things by northdpole · Pull Request #734 · OWASP/OpenCRE

northdpole · 2026-02-01T15:59:28Z

No description provided.

PRAteek-singHWY · 2026-02-01T22:47:27Z

@northdpole
Thanks a lot for sharing this sir, this is extremely helpful and very well structured.

I've gone through the RFC and it gives a clear architectural and experimental framework to build the proposal around. I'll spend some time digesting it in detail and start aligning my work proposal with this design and the pre-code experiments outlined here.

PRAteek-singHWY · 2026-02-02T00:18:49Z

@northdpole

Thanks for putting this together Sir, the experimental framework is really clear.

I’m particularly interested in Module C (The Librarian) and want to start with the suggested pre-code experiments before proposing any concrete design or implementation.

The negation problem stands out — I’ve worked on gap analysis features before (#716) and have seen how basic similarity metrics can struggle with logical inversions in requirements (e.g., “Use X” vs “Do NOT use X”).

Plan:
I’ll start with the ASVS re-classification experiment:

Extract 50 ASVS requirements and strip metadata
Baseline: vector search with cosine similarity
Comparison: cross-encoder re-ranking (ms-marco-MiniLM-L-6-v2)
Target: >20% accuracy improvement on negative requirements

If the experiment is successful, I’m also interested in exploring hybrid search (vector + BM25), especially for cases like CVE identifiers where pure vector search often underperforms.

I'll take this up step by step .

I’ll share experiment results and observations before proposing any implementation.

I’m using AI tools (similar to Cursor/Windsurf) and have read Section 3.

Thank you .

manshusainishab · 2026-02-08T09:53:33Z

Hi @northdpole ,

Thanks for putting together this RFC — the structure, pre-code experiments, and CI-first mindset are exactly the kind of system I enjoy working on.

I’d like to formally express my interest in owning Module B: Noise / Relevance Filter as my primary contribution, and I’m also happy to assist with adjacent modules where needed.

So Why Module B

The framing of Module B as a cheap, high-signal gate before expensive downstream processing resonates strongly with me. Getting this layer right feels critical to the quality, cost, and trustworthiness of the entire pipeline, especially given the planned regression dataset and CI enforcement.

Proposed Plan of Action (Aligned with the RFC)
I plan to follow the RFC strictly and start with experiments before any production code:

Human Benchmark (Pre-Code Experiment)
Manually label them as:
Security Knowledge
Noise (formatting, admin, linting, meta updates)
This dataset will be versioned and reusable as an early “golden slice.”
Prompt Iteration & Evaluation
Start with a simple binary JSON output prompt:

“Is this content introducing or modifying security-relevant knowledge?”
Evaluate against the human benchmark.
Iterate until accuracy consistently exceeds 97%, with special attention to known failure modes (e.g., Code of Conduct updates, formatting-only diffs).

Regex + LLM Cost Control
Design the regex filter to aggressively eliminate obvious noise first (lockfiles, CSS, tests, config).
Ensure the LLM is only invoked on borderline or content-heavy diffs.
Document false positives / negatives clearly for future contributors.
CI & Dataset Readiness
Structure outputs so they can plug cleanly into the planned golden_dataset.json.
Ensure behavior is deterministic and testable for CI regression checks.

And Cross-Module Contributions

While Module B would be my ownership area, I can also help with:
Module A: defining shared interfaces and assumptions between diff harvesting and filtering.
CI / Evaluation: contributing test cases and failure examples derived from Module B experiments.

I’ve read and understood Section 3 (Agent-Ready CI & AI-generated PR constraints) and I’m comfortable working within those boundaries.

Looking forward to collaborating — this project feels like a rare opportunity to build something both technically rigorous and genuinely useful.

Best,
Manshu

PRAteek-singHWY · 2026-02-10T20:03:15Z

@northdpole

Thanks for putting this together Sir, the experimental framework is really clear.

I’m particularly interested in Module C (The Librarian) and want to start with the suggested pre-code experiments before proposing any concrete design or implementation.

The negation problem stands out — I’ve worked on gap analysis features before (#716) and have seen how basic similarity metrics can struggle with logical inversions in requirements (e.g., “Use X” vs “Do NOT use X”).

Plan: I’ll start with the ASVS re-classification experiment:

Extract 50 ASVS requirements and strip metadata

Baseline: vector search with cosine similarity

Comparison: cross-encoder re-ranking (ms-marco-MiniLM-L-6-v2)

Target: >20% accuracy improvement on negative requirements

If the experiment is successful, I’m also interested in exploring hybrid search (vector + BM25), especially for cases like CVE identifiers where pure vector search often underperforms.

I'll take this up step by step .

I’ll share experiment results and observations before proposing any implementation.

I’m using AI tools (similar to Cursor/Windsurf) and have read Section 3.

Thank you .

@northdpole Module C update (pre‑code experiment complete)

I ran the RFC‑required 50‑item ASVS experiment and also a 100‑item stability check to reduce variance (the negative subset is small, so a larger sample gives a more stable signal).

Results (negative top‑1):

50‑item: 0.625 → 1.0
100‑item: 0.6667 → 1.0

This passes the RFC success criteria (>20% improvement on negative requirements).

Design doc (pipeline + CI plan):

https://gist.github.com/PRAteek-singHWY/7b35f0edbd9b8354257f3f5366951dab

Hybrid search (BM25 + vector) is listed as a bonus. I have not implemented it yet; I plan to explore it after the pre‑code experiment and design are approved.

Next steps per RFC (please confirm):

Finalize design + interfaces
Build golden_dataset.json + evaluation harness (CI regression)
Implement Module C retrieval + re‑rank + update detection
Tune threshold against the golden dataset

robvanderveer · 2026-02-10T21:23:43Z

Awesome, but requires some redesigning I think. Let's find out together.

Start the description of the proposed solution with the functionality promise:
We can unlock all of OWASP content as one resource in a structured way using the new technologies that have come available with AI. People will be able to get comprehensive answers to their questions and lookup queries.
It seems we’re scraping everything but that means that we’ll also be scraping multiple versions, as some projects have different folder for different versions ,of which some have not been published yet. I think that will lead to too much noise. A better option is to let repos have a robot.txt with the scraping folders listed and some optional metadata, like what we should call it.
The module that fetches changes is trying to solve a problem that everybody has, and that already must have been solved. We shouldn’t reinvent that wheel. Llamaindex and Langchain have solutions for this. It’s just a matter of presenting the entire new files again and let that tech do the diffs, instead of looking at the GitHub diffs. The latter sounds more efficient, but we shouldn’t try to build a smart diff handler for chunking and embedding.
A quick search found validatedpatterns-sandbox/vector-embedder. Dunno if it does diffs, but it does GitHub.
By the way, the purpose of the module doesn’t really become clear. I seem to be missing a module that does the chunking and embedding calculation.
We definitely should put the early designs of parsing links to opencre into
the librarion module: if a source section has a link to opencre, that’s the link.
We also should put the early designs of defining deliniation of sections into the chunking module: the source specifying patterns to search for that deliniate chunks.

Let’s book time next week and work an hour on this together. Slack me options please, if you’re open.

PRAteek-singHWY · 2026-02-11T06:44:08Z

Hey @robvanderveer

Thanks for the detailed feedback. I updated the Module C design to align with your points.

Key changes:

Starts with the functionality promise.
Clarifies boundaries: Librarian now focuses on mapping/semantics only.
Adds link-first logic: if a source section has an OpenCRE link, that mapping is authoritative.
Moves chunk delineation and embedding ownership upstream (separate chunking module).
Assumes framework-based ingestion/change handling (LlamaIndex/LangChain style), not custom smart diff parsing in Librarian.
Keeps cross-encoder negation handling and CI regression gates.

Updated design:
https://gist.github.com/PRAteek-singHWY/7b35f0edbd9b8354257f3f5366951dab

Also happy to sync live for 1 hour around next week; I will share timing on Slack.

shreyakash24 · 2026-02-21T14:14:29Z

Hi @northdpole,
I would like to work on Module A. I have done its pre-code experiment to validate the technical feasibility of extracting high-signal security knowledge from the OWASP ecosystem.

Experiment Results & Quality Metrics:

73.43% Token Compression: The pipeline successfully removed bulk infrastructure noise (CI/CD YAML, lockfiles, etc.). This represents a ~73% reduction in LLM operational costs by ensuring only semantic content is processed.
High Semantic Density (14.41 Chunks/k-token): The system isolates a high-density stream of actionable security knowledge chunks.
Precision & Integrity: Critical security documentation passed the filters, while infrastructure-only files were accurately rejected.

Shall I continue to write a detailed proposal regarding this?

manshusainishab · 2026-02-21T19:34:27Z

Hi @northdpole ,

I’ve been thinking about a lightweight “Noise / Relevance Filter” (Module B). As your idea suggest to first apply a cheap regex-based filter to discard obvious non-knowledge changes (formatting, lockfiles, minor docs), and then use a small LLM classifier to determine whether a commit actually adds meaningful security knowledge.

AS plan suggests to validate this with a benchmark on ~100 historical commits to measure precision before proposing full integration.

Additionally, I’d like your thoughts on optionally adding a CodeRabbit AI layer to generate a structured diff summary before sending context to the LLM. Since CodeRabbit is free for open-source projects, it could provide higher-quality summaries and improve classification accuracy by giving the LLM better semantic context.

Would you be open to this direction, or prefer a simpler initial baseline first?

PRAteek-singHWY · 2026-02-22T01:21:30Z

Hey team @northdpole , @robvanderveer , and @Pa04rth 👋

Following up on our recent architectural discussions, I’ve spent the last 10 days deeply analyzing the end-to-end pipeline for Project OIE (#734). as conveyed to Spyros Since I have 6-7 months of extended bandwidth due to my internship term and less academic pressure , my goal for this GSoC period is to take ownership of creating a complete, production-ready flow across the ecosystem, under guidance of all my mentors.

As Rob accurately stated: "We can unlock all of OWASP content as one resource in a structured way using the new technologies that have come available with AI."

To ensure complete clarity and alignment before the proposal deadline, I have physically mapped out the architectural blueprints and tool stacks for the entire project.

How the modules connect in one line:

The Upstream Ingestion Module provides clean, framework-delineated text chunks, The Librarian (Module C) intelligently maps those chunks while natively solving logical negations, and The Dashboard (Module D) acts as a high-speed human-review gate to ensure the OpenCRE graph is never corrupted.

I have broken down my blueprints into 4 detailed documents (with flow diagrams and tool selections):

🎯 1. System Goals & Architecture Flow

Mapping the Functionality Promise and visualizing exactly how the data flows from GitHub, through the three modules, to the Master Database.
📄 System_Goals_&_Architecture_Flow.pdf

📦 2. The Upstream Data Prep (Ingestion & Chunking)

Addressing Rob's feedback: Implementing robots.txt noise filtering, and delegating git-diff/state tracking to established frameworks (LlamaIndex / vector-embedder) so we don't reinvent the wheel. (3 Components explained)
📄 The_Upstream_Data_Prep_(Ingestion_&_Chunking).pdf

🧠 3. Module C: The Librarian (Semantic Intelligence)

Focusing strictly on mapping: Implementing Link-First authoritative overrides, and utilizing my successful Pre-Code Experiment (Cross-Encoders) to solve the "Negation Problem" with 100% accuracy. (2 Components explained)
📄 Module_C-The_Librarian(Semantic_Intelligence).pdf

📊 4. Module D: The Dashboard (Human-in-the-Loop)

Building a "Tinder-speed review UI with keyboard bindings to allow maintainers to clear <0.8 confidence threshold queues in minutes, while logging rejections for future ML training. (3 Components explained)
📄 Module_D-The_Dashboard(Human_in_the_loop).pdf

I would love your feedback on these blueprints to ensure my final proposal hits the exact mark you envision for this living knowledge base!

robvanderveer · 2026-02-24T15:31:57Z

@@ -0,0 +1,262 @@
+# RFC: The OpenCRE Scraper & Indexer (Project OIE)


Change name to OWASP Agent. Position it as promise first: the why, not the how. So not: 'scraper and indexer'

robvanderveer · 2026-02-24T15:43:53Z

+    Don't rely just on vectors. Use Hybrid Search (Vector + Keyword/BM25).
+    Why: Vectors are bad at exact keyword matches (e.g., specific CVE IDs).
+
+### Module D: HITL & Logging


Please make the workflow more clear. thanks

Thank you @robvanderveer that makes a lot of sense.
I’ll rename this to OWASP Agent and adjust the introduction to focus first on the problem and the promise it delivers, before going into the implementation details.

I’ll also rework the workflow section to make the end-to-end flow clearer and more explicit, especially around module responsibilities and how data moves between ingestion, hybrid retrieval, semantic reasoning, human validation, and the master database.
I’ll iterate on the document accordingly.

manshusainishab · 2026-02-24T21:40:03Z

Hi @northdpole,

I wanted to share a quick update on the Noise/Relevance Filter prototype.

I’ve extracted 100 randomly sampled historical commits and manually labeled them (80 noise / 20 security knowledge) to create a gold benchmark dataset. I then implemented a batch-based LLM classifier (Gemini) with rate limiting and evaluated it against this dataset.

Current results after prompt calibration:

Accuracy: 87%
Precision: 64%
Recall: 80%

I have significantly reduced false positives through stricter “new security concept” criteria, but there’s still room to improve precision further before proposing integration.

I’ve temporarily paused experimentation due to API quota limits, but I’ll continue refining the prompt and evaluation loop to push precision higher while keeping recall stable.

Would you prefer prioritizing higher precision (fewer false positives) even at the cost of some recall?

And I also want to get the feedback of adding a layer of coderabbitai so LLM can get a better understanding of the changes and code base.

this is the repo I have created if you are intrested
https://github.com/manshusainishab/OpenCRE_test_project

ParthAggarwal16 · 2026-03-06T15:05:23Z

Hi everyone @northdpole , @robvanderveer , @Pa04rth

I’m currently exploring Module D (Human-in-the-Loop review + logging) and wanted to briefly share the direction I’m considering so I can get early feedback from the community.

My current understanding is that Module D acts as the human validation layer for AI-generated classifications coming from Module C, and its main responsibility is to allow maintainers to review flagged items quickly while generating high-quality correction logs that can later be used to improve the model.

The approach I’m currently exploring focuses on three main components:

1. Review Interface (Fast HITL workflow)

A minimal React-based admin UI designed for very fast review cycles (~3 seconds per item).

The idea is a keyboard-optimized workflow similar to a “Tinder-style” review:

y → accept prediction
n → reject prediction
e → edit label
s → skip item

The goal is to minimize clicks and allow maintainers to process review queues extremely quickly.

2. Review Queue + API Layer

A lightweight Flask backend responsible for:

serving items requiring review (produced by Module C)
managing the review queue
handling authentication and role-based access control (RBAC) for maintainers
submitting review decisions

Example API endpoints:

GET /review/next
POST /review/submit
GET /dashboard/stats

3. Structured Logging (JSONL → S3 / MinIO)

Instead of storing corrections directly in a database, every review action would append a structured JSON entry to JSONL logs stored in S3/MinIO.

Example log entry:

{
  "item_id": "...",
  "input_text": "...",
  "ai_prediction": "...",
  "human_label": "...",
  "reviewer": "...",
  "timestamp": "..."
}

This keeps the correction history append-only and reproducible, while also creating a clean dataset for potential model retraining later.

Pre-Code Experiments

Before implementing anything, I plan to validate two assumptions:

1. Review Speed Test

Build a small prototype to test whether reviewers can approve/reject items in <3 seconds using keyboard shortcuts.

2. Logging Pipeline Test

Verify the append-only JSONL logging flow and S3/MinIO upload behavior.

I’m also exploring the bonus “Loss Warehousing” idea to capture structured correction events that could later be used for model retraining.

I’ll share a small design/experiment gist shortly once I finish documenting the approach.

If there are any existing expectations around queue storage, authentication, or logging format, I’d love to align with those early.

Thanks!

ParthAggarwal16 · 2026-03-10T18:48:42Z

Hi OpenCRE team @northdpole @robvanderveer @Pa04rth ,

I've been working on early design exploration for parts of the GSoC pipeline and wanted to share two draft design notes for feedback before moving into implementation.

These cover:

Module A — Information Harvesting

Nightly GitHub Actions pipeline
Incremental diff-based harvesting
Regex noise filtering (>90% file elimination before download)
Markdown diff parsing and raw change storage

Module D — Human-in-the-Loop Review

Keyboard-first review workflow (Accept / Reject / Edit)
Confidence-based queue routing
Append-only JSONL logging
Feedback loop for retraining the mapping model

Both modules include pre-code experiments to validate assumptions:

File filtering effectiveness on a real OWASP repository
Reviewer decision latency using a lightweight prototype

Gist:
https://gist.github.com/ParthAggarwal16/44da1185a9203da6e3114ba9d6d8c19e

This is still a draft and I'd really appreciate feedback on:

architectural assumptions
data flow boundaries between modules
anything that looks incompatible with the current OpenCRE pipeline.

Thanks!

northdpole · 2026-03-12T10:47:31Z

Hi everyone @northdpole , @robvanderveer , @Pa04rth

I’m currently exploring Module D (Human-in-the-Loop review + logging) and wanted to briefly share the direction I’m considering so I can get early feedback from the community.

My current understanding is that Module D acts as the human validation layer for AI-generated classifications coming from Module C, and its main responsibility is to allow maintainers to review flagged items quickly while generating high-quality correction logs that can later be used to improve the model.

The approach I’m currently exploring focuses on three main components:

1. Review Interface (Fast HITL workflow)

A minimal React-based admin UI designed for very fast review cycles (~3 seconds per item).

The idea is a keyboard-optimized workflow similar to a “Tinder-style” review:

y → accept prediction

n → reject prediction

e → edit label

s → skip item

The goal is to minimize clicks and allow maintainers to process review queues extremely quickly.

Yes that's the point, also look at the interface of git add -p similar to what you're describing.
Just noting: skip should mean that the chunk/prediction returns in the future or you can go back to it.

2. Review Queue + API Layer

A lightweight Flask backend responsible for:

Makes sense, just keep in mind that it should not be a whole new application, you can make it as a blueprint/new routes of the existing one.

2. Logging Pipeline Test
too bonus for now, let's nail down the basic first

Mahaboobunnisa123 · 2026-03-12T14:56:02Z

Hi team @northdpole, @Pa04rth @robvanderveer, I'd like to work on Module D (HITL & Logging) for GSoC.
I've been contributing to both OpenCRE and Cornucopia recently so I'm familiar with the codebase. I noticed from the existing Flask structure that Module D should be implemented as a blueprint with new routes rather than a standalone app, which aligns with how the existing application is structured.
I'm currently working on the pre-code experiment - building a minimal prototype to validate the 3-second keyboard-driven review flow. Will share results here shortly.
One question before I go deeper - should the review queue be backed by the existing Redis setup in the app, or is a simpler DB-backed queue preferred for the initial implementation?

ParthAggarwal16 · 2026-03-12T15:26:32Z

Hi everyone @northdpole , @robvanderveer , @Pa04rth
I’m currently exploring Module D (Human-in-the-Loop review + logging) and wanted to briefly share the direction I’m considering so I can get early feedback from the community.
My current understanding is that Module D acts as the human validation layer for AI-generated classifications coming from Module C, and its main responsibility is to allow maintainers to review flagged items quickly while generating high-quality correction logs that can later be used to improve the model.
The approach I’m currently exploring focuses on three main components:

1. Review Interface (Fast HITL workflow)

A minimal React-based admin UI designed for very fast review cycles (~3 seconds per item).
The idea is a keyboard-optimized workflow similar to a “Tinder-style” review:

y → accept prediction

n → reject prediction

e → edit label

s → skip item

The goal is to minimize clicks and allow maintainers to process review queues extremely quickly.

Yes that's the point, also look at the interface of git add -p similar to what you're describing. Just noting: skip should mean that the chunk/prediction returns in the future or you can go back to it.

2. Review Queue + API Layer

A lightweight Flask backend responsible for:

Makes sense, just keep in mind that it should not be a whole new application, you can make it as a blueprint/new routes of the existing one.

2. Logging Pipeline Test
too bonus for now, let's nail down the basic first

Hey @northdpole , thanks for the feedback !!
Good point about the git add -p style interaction, that’s actually very close to what I had in mind for keeping the review flow fast and keyboard-driven.

Also makes sense regarding skip. I’ll treat it more as a defer action so the item can return to the queue later or be revisited.

And noted on the architecture, I’ll keep the backend part lightweight and integrate it as routes/blueprints within the existing application instead of spinning up a separate service.

For now I’ll focus on getting the basic review and decision logging flow working first and keep the rest as future extensions.

Appreciate the guidance!

Mahaboobunnisa123 · 2026-03-12T15:35:22Z

Hi team @northdpole @robvanderveer @Pa04rth, Pre-code experiment update for Module D. Built a minimal keyboard-driven review prototype to validate the 3-second review flow assumption.
Results from testing:

5 items reviewed using Y/N/E/S keyboard shortcuts
Average review time: 3.15s
The keyboard interface works well for quick decisions - once a reviewer is familiar with the content type, sub-3s reviews are achievable
JSONL-style log output confirmed working (item_id, action, time, label)

Observation: First-time reviewers take slightly longer on unfamiliar content (~4s), but repeat reviewers with domain knowledge should comfortably stay under 3s. This validates the keyboard-first approach.
Prototype: https://gist.github.com/Mahaboobunnisa123/ff0f22a51d5042e66da07154666ab10f

Happy to take any feedback or suggestions on this. Thank you!

Align Module A–D architecture with ingest buckets, chunk handovers, and §2.5 contract table for GSoC implementers.

Introduce project README and handover API overview linking module docs and schemas.

Define IngestBatch, ArtifactIngestEvent, IngestChunk, and JSONL handover to Module B.

Define KnowledgeItem handover from ingest chunks to the knowledge queue.

Define LinkProposal, ReviewItem, and CRE candidate schemas for auto-link and HITL paths.

Define HumanDecision append-only log for maintainer approve, reject, and correct actions.

added RFC on how to create a living knowledge base of owasp things

1539e0a

robvanderveer reviewed Feb 24, 2026

View reviewed changes

northdpole added 6 commits May 15, 2026 12:50

docs(oie): extend RFC for harvest, chunk, and pipeline contracts

83e8f3d

Align Module A–D architecture with ingest buckets, chunk handovers, and §2.5 contract table for GSoC implementers.

docs(oie): add OWASP graph docs index and API conventions

fcf16ed

Introduce project README and handover API overview linking module docs and schemas.

docs(oie-module-a): add harvest, normalize, and chunk API contracts

25b2b0b

Define IngestBatch, ArtifactIngestEvent, IngestChunk, and JSONL handover to Module B.

docs(oie-module-b): add relevance filter API contract

f4190e9

Define KnowledgeItem handover from ingest chunks to the knowledge queue.

docs(oie-module-c): add librarian linking and review API contracts

6b27e43

Define LinkProposal, ReviewItem, and CRE candidate schemas for auto-link and HITL paths.

docs(oie-module-d): add HITL corrections API contract

2b14379

Define HumanDecision append-only log for maintainer approve, reject, and correct actions.

northdpole force-pushed the owasp-graph branch from fdaf1c5 to 2b14379 Compare May 15, 2026 11:50

		@@ -0,0 +1,262 @@
		# RFC: The OpenCRE Scraper & Indexer (Project OIE)

Conversation

northdpole commented Feb 1, 2026

Uh oh!

PRAteek-singHWY commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PRAteek-singHWY commented Feb 2, 2026

Uh oh!

manshusainishab commented Feb 8, 2026

Uh oh!

PRAteek-singHWY commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robvanderveer commented Feb 10, 2026

Uh oh!

PRAteek-singHWY commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shreyakash24 commented Feb 21, 2026

Uh oh!

manshusainishab commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PRAteek-singHWY commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 1. System Goals & Architecture Flow

📦 2. The Upstream Data Prep (Ingestion & Chunking)

🧠 3. Module C: The Librarian (Semantic Intelligence)

📊 4. Module D: The Dashboard (Human-in-the-Loop)

Uh oh!

robvanderveer Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

robvanderveer Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

PRAteek-singHWY Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manshusainishab commented Feb 24, 2026

Uh oh!

ParthAggarwal16 commented Mar 6, 2026

1. Review Interface (Fast HITL workflow)

2. Review Queue + API Layer

3. Structured Logging (JSONL → S3 / MinIO)

Uh oh!

ParthAggarwal16 commented Mar 10, 2026

Uh oh!

northdpole commented Mar 12, 2026

1. Review Interface (Fast HITL workflow)

2. Review Queue + API Layer

Uh oh!

Mahaboobunnisa123 commented Mar 12, 2026

Uh oh!

ParthAggarwal16 commented Mar 12, 2026

1. Review Interface (Fast HITL workflow)

2. Review Queue + API Layer

Uh oh!

Mahaboobunnisa123 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

PRAteek-singHWY commented Feb 1, 2026 •

edited

Loading

PRAteek-singHWY commented Feb 10, 2026 •

edited

Loading

PRAteek-singHWY commented Feb 11, 2026 •

edited

Loading

manshusainishab commented Feb 21, 2026 •

edited

Loading

PRAteek-singHWY commented Feb 22, 2026 •

edited

Loading

PRAteek-singHWY Feb 24, 2026 •

edited

Loading

Mahaboobunnisa123 commented Mar 12, 2026 •

edited

Loading