Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set by adaamko · Pull Request #39 · KRLabsOrg/LettuceDetect

adaamko · 2026-06-15T09:32:35Z

Reworks the code-hallucination data into a request-grounded coding-agent pipeline and ships the resulting dataset (published as the lettucedetect-code-agent source of KRLabsOrg/lettucedetect-code-hallucination).

What changed

Generation (generate_code_agent_hallucinations.py): gold-answer modes (function/fragment/edit), request-grounded injection (wrong_implementation / unrequested_change / fabricated_api), length cap, --repos/--hall-ids-file selection, continuous-concurrency runner.
Grounding (code_hallucination/answer_grounding.py): four-tier repository grounding + Context7 third-party signatures so answer references aren't mistaken for fabrications; transient-fetch fix so rate-limit windows aren't cached as permanent misses. reground_code_agent_contexts.py repairs grounding in place.
Taxonomy: map_label rejects unknown labels at the boundary; build_hf_dataset.py validates every span category.
Tooling: check_context_quality.py audit, convert_clean_samples_to_hallucinated.py (class-balance raise), train_span_detector.py (fast HF-Trainer path), Streamlit viewer.
Docs: index/phases/configuration + provenance.md (construction → audit → repair → 50% balance → test verification).

Test-set verification

The code test split is individually reviewed (2,038 → 2,015 retained, 50.3% hallucinated): full first-pass review (92.9% accept), blind second-pass adjudication, and evidence arbitration against the true pre-fix sources. 235 spans tightened, 23 dropped, 5 reclassified clean, 23 removed. Train/validation remain machine-generated with automated gates.

…line Model the real task: a coding agent is given a developer request plus repository context and produces a coherent solution (explanation + code), then realistic, request-grounded mistakes are injected with exact spans. - agent_solution.py: generate a correct assistant solution; "files" and "edit" answer styles for output variety. - generate_code_agent_hallucinations.py: per-instance solution -> inject intent mistakes (wrong_implementation / unrequested_change) or a structural fabrication (fabricated_api, grounded by an absence check); SWE-bench official split, docs/dependency context when present. - taxonomy.py: code_agent label map. - injection.py: looser span location (first occurrence, whitespace fallback, no-op guard); unmapped edit types default per mode. - pipeline.py: prep-only (load, fetch source, rewrite requests). - Remove the patch-derived modules superseded by this path.

…nt generation - answer_grounding: four-tier reference grounding (modified functions, changed files, answer imports, and modules the changed files import for cross-module self.method calls) plus exact-method Context7 signatures for third-party APIs. - generator: gold answer styles (function/fragment/edit) with a length cap, request-grounded intent injection + structural fabrication with absence guard, trivial-answer filter, --repos/--exclude-repos selection. - runner: continuous-concurrency scheduling (semaphore) replacing barriered waves. - source_fetcher: optional GITHUB_TOKEN auth for raw fetches. - check_context_quality: dataset grounding-coverage + label-quality audit. - code_hallucination_viewer: Streamlit viewer with category-highlighted spans. - docs: update code-hallucination and generation docs to the current pipeline.

…ient fetch failures - taxonomy.map_label raises on unknown source/native labels instead of passing them through; menu injection turns that into a validation:unknown_native_label rejection so bad labels never reach generated data. - build_hf_dataset validates every span category against the taxonomy at merge. - fetch_file_from_github distinguishes definitive misses (None) from transient failures (TransientFetchError on timeout/429/5xx); the grounding fetch cache no longer turns a rate-limit window into permanent misses, and retries once. - answer_grounding audit metric ignores comments/strings and a fuller builtin set.

- reground_code_agent_contexts.py: resolves a sample answer's ungrounded references at the base commit and prepends the missing Referenced definitions block to context and prompt, leaving answers and spans untouched. Labeled spans are blanked first so injected text is never grounded. - remaining_ungrounded treats names the answer itself imports as evidenced by the import statement (stdlib/third-party modules cannot be grounded from the repo; repo-internal imports are grounded by resolve_definitions).

Lets a run target an exact instance set (e.g. converting existing clean samples to hallucinated ones to raise the class balance) instead of sampling --ratio of all instances, which is not stable across ratios.

Raises a source's hallucination rate to a target by re-running the source's own injection prompts on a seeded selection of clean samples. QC failures keep the clean sample, so conversion never loses data; each sample stays single-class (no clean/hallucinated twins).

Three-tier review of the code test split (full first pass, blind adjudication, evidence arbitration against pre-fix sources) with the applied repairs and final counts, in provenance.md.

train_span_detector.py: one command for hub or local v2 data — tokenize-once arrow dataset, bf16, dynamic padding, step-based eval, best checkpoint by hallucinated-token F1, optional prompt windowing for 4k encoders, trust-remote-code for EuroBERT. Label semantics verified token-identical to HallucinationDataset. index.md training section documents the new path.

…gth)

adaamko added 9 commits June 7, 2026 16:55

Add --hall-ids-file for explicit hallucination-target selection

9b65ee9

Lets a run target an exact instance set (e.g. converting existing clean samples to hallucinated ones to raise the class balance) instead of sampling --ratio of all instances, which is not stable across ratios.

Document the dataset's construction, audit, and repair provenance

d9401f2

Document the test-set verification protocol

8d66dea

Three-tier review of the code test split (full first pass, blind adjudication, evidence arbitration against pre-fix sources) with the applied repairs and final counts, in provenance.md.

adaamko self-assigned this Jun 15, 2026

adaamko added 9 commits June 15, 2026 13:06

Allow merging multiple hub datasets in span-detector training

31ea5ad

Add --limit to span trainer for smoke tests

830ea3d

Drop group_by_length (not a TrainingArguments kwarg in transformers 5.x)

2516814

Add per-source/per-language span-model eval harness

e5102d8

Tokenize once on rank 0 under DDP (main_process_first); use all 4 GPUs

3dd1af2

Use longest_first truncation (only_first crashes on answers > max_len…

5be55b5

…gth)

Resume only if a checkpoint exists (fresh start no longer errors)

01f7930

Add span-text SFT data builder for the generative detector

3991c78

Single-pass span-model eval (predict once, ~4x faster)

890d651

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set#39

Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set#39
adaamko wants to merge 18 commits into
mainfrom
code-agent-solutions

adaamko commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adaamko commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Test-set verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adaamko commented Jun 15, 2026 •

edited

Loading