Skip to content

Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set#39

Open
adaamko wants to merge 18 commits into
mainfrom
code-agent-solutions
Open

Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set#39
adaamko wants to merge 18 commits into
mainfrom
code-agent-solutions

Conversation

@adaamko

@adaamko adaamko commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Reworks the code-hallucination data into a request-grounded coding-agent pipeline and ships the resulting dataset (published as the lettucedetect-code-agent source of KRLabsOrg/lettucedetect-code-hallucination).

What changed

  • Generation (generate_code_agent_hallucinations.py): gold-answer modes (function/fragment/edit), request-grounded injection (wrong_implementation / unrequested_change / fabricated_api), length cap, --repos/--hall-ids-file selection, continuous-concurrency runner.
  • Grounding (code_hallucination/answer_grounding.py): four-tier repository grounding + Context7 third-party signatures so answer references aren't mistaken for fabrications; transient-fetch fix so rate-limit windows aren't cached as permanent misses. reground_code_agent_contexts.py repairs grounding in place.
  • Taxonomy: map_label rejects unknown labels at the boundary; build_hf_dataset.py validates every span category.
  • Tooling: check_context_quality.py audit, convert_clean_samples_to_hallucinated.py (class-balance raise), train_span_detector.py (fast HF-Trainer path), Streamlit viewer.
  • Docs: index/phases/configuration + provenance.md (construction → audit → repair → 50% balance → test verification).

Test-set verification

The code test split is individually reviewed (2,038 → 2,015 retained, 50.3% hallucinated): full first-pass review (92.9% accept), blind second-pass adjudication, and evidence arbitration against the true pre-fix sources. 235 spans tightened, 23 dropped, 5 reclassified clean, 23 removed. Train/validation remain machine-generated with automated gates.

adaamko added 9 commits June 7, 2026 16:55
…line

Model the real task: a coding agent is given a developer request plus
repository context and produces a coherent solution (explanation + code),
then realistic, request-grounded mistakes are injected with exact spans.

- agent_solution.py: generate a correct assistant solution; "files" and
  "edit" answer styles for output variety.
- generate_code_agent_hallucinations.py: per-instance solution -> inject
  intent mistakes (wrong_implementation / unrequested_change) or a
  structural fabrication (fabricated_api, grounded by an absence check);
  SWE-bench official split, docs/dependency context when present.
- taxonomy.py: code_agent label map.
- injection.py: looser span location (first occurrence, whitespace
  fallback, no-op guard); unmapped edit types default per mode.
- pipeline.py: prep-only (load, fetch source, rewrite requests).
- Remove the patch-derived modules superseded by this path.
…nt generation

- answer_grounding: four-tier reference grounding (modified functions, changed
  files, answer imports, and modules the changed files import for cross-module
  self.method calls) plus exact-method Context7 signatures for third-party APIs.
- generator: gold answer styles (function/fragment/edit) with a length cap,
  request-grounded intent injection + structural fabrication with absence guard,
  trivial-answer filter, --repos/--exclude-repos selection.
- runner: continuous-concurrency scheduling (semaphore) replacing barriered waves.
- source_fetcher: optional GITHUB_TOKEN auth for raw fetches.
- check_context_quality: dataset grounding-coverage + label-quality audit.
- code_hallucination_viewer: Streamlit viewer with category-highlighted spans.
- docs: update code-hallucination and generation docs to the current pipeline.
…ient fetch failures

- taxonomy.map_label raises on unknown source/native labels instead of passing
  them through; menu injection turns that into a validation:unknown_native_label
  rejection so bad labels never reach generated data.
- build_hf_dataset validates every span category against the taxonomy at merge.
- fetch_file_from_github distinguishes definitive misses (None) from transient
  failures (TransientFetchError on timeout/429/5xx); the grounding fetch cache
  no longer turns a rate-limit window into permanent misses, and retries once.
- answer_grounding audit metric ignores comments/strings and a fuller builtin set.
- reground_code_agent_contexts.py: resolves a sample answer's ungrounded
  references at the base commit and prepends the missing Referenced
  definitions block to context and prompt, leaving answers and spans
  untouched. Labeled spans are blanked first so injected text is never
  grounded.
- remaining_ungrounded treats names the answer itself imports as evidenced
  by the import statement (stdlib/third-party modules cannot be grounded
  from the repo; repo-internal imports are grounded by resolve_definitions).
Lets a run target an exact instance set (e.g. converting existing clean
samples to hallucinated ones to raise the class balance) instead of
sampling --ratio of all instances, which is not stable across ratios.
Raises a source's hallucination rate to a target by re-running the
source's own injection prompts on a seeded selection of clean samples.
QC failures keep the clean sample, so conversion never loses data; each
sample stays single-class (no clean/hallucinated twins).
Three-tier review of the code test split (full first pass, blind
adjudication, evidence arbitration against pre-fix sources) with the
applied repairs and final counts, in provenance.md.
train_span_detector.py: one command for hub or local v2 data —
tokenize-once arrow dataset, bf16, dynamic padding, step-based eval,
best checkpoint by hallucinated-token F1, optional prompt windowing
for 4k encoders, trust-remote-code for EuroBERT. Label semantics
verified token-identical to HallucinationDataset. index.md training
section documents the new path.
@adaamko adaamko self-assigned this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant