Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set#39
Open
adaamko wants to merge 18 commits into
Open
Code-agent hallucination pipeline: grounding, 50% balance, expert-verified test set#39adaamko wants to merge 18 commits into
adaamko wants to merge 18 commits into
Conversation
…line Model the real task: a coding agent is given a developer request plus repository context and produces a coherent solution (explanation + code), then realistic, request-grounded mistakes are injected with exact spans. - agent_solution.py: generate a correct assistant solution; "files" and "edit" answer styles for output variety. - generate_code_agent_hallucinations.py: per-instance solution -> inject intent mistakes (wrong_implementation / unrequested_change) or a structural fabrication (fabricated_api, grounded by an absence check); SWE-bench official split, docs/dependency context when present. - taxonomy.py: code_agent label map. - injection.py: looser span location (first occurrence, whitespace fallback, no-op guard); unmapped edit types default per mode. - pipeline.py: prep-only (load, fetch source, rewrite requests). - Remove the patch-derived modules superseded by this path.
…nt generation - answer_grounding: four-tier reference grounding (modified functions, changed files, answer imports, and modules the changed files import for cross-module self.method calls) plus exact-method Context7 signatures for third-party APIs. - generator: gold answer styles (function/fragment/edit) with a length cap, request-grounded intent injection + structural fabrication with absence guard, trivial-answer filter, --repos/--exclude-repos selection. - runner: continuous-concurrency scheduling (semaphore) replacing barriered waves. - source_fetcher: optional GITHUB_TOKEN auth for raw fetches. - check_context_quality: dataset grounding-coverage + label-quality audit. - code_hallucination_viewer: Streamlit viewer with category-highlighted spans. - docs: update code-hallucination and generation docs to the current pipeline.
…ient fetch failures - taxonomy.map_label raises on unknown source/native labels instead of passing them through; menu injection turns that into a validation:unknown_native_label rejection so bad labels never reach generated data. - build_hf_dataset validates every span category against the taxonomy at merge. - fetch_file_from_github distinguishes definitive misses (None) from transient failures (TransientFetchError on timeout/429/5xx); the grounding fetch cache no longer turns a rate-limit window into permanent misses, and retries once. - answer_grounding audit metric ignores comments/strings and a fuller builtin set.
- reground_code_agent_contexts.py: resolves a sample answer's ungrounded references at the base commit and prepends the missing Referenced definitions block to context and prompt, leaving answers and spans untouched. Labeled spans are blanked first so injected text is never grounded. - remaining_ungrounded treats names the answer itself imports as evidenced by the import statement (stdlib/third-party modules cannot be grounded from the repo; repo-internal imports are grounded by resolve_definitions).
Lets a run target an exact instance set (e.g. converting existing clean samples to hallucinated ones to raise the class balance) instead of sampling --ratio of all instances, which is not stable across ratios.
Raises a source's hallucination rate to a target by re-running the source's own injection prompts on a seeded selection of clean samples. QC failures keep the clean sample, so conversion never loses data; each sample stays single-class (no clean/hallucinated twins).
Three-tier review of the code test split (full first pass, blind adjudication, evidence arbitration against pre-fix sources) with the applied repairs and final counts, in provenance.md.
train_span_detector.py: one command for hub or local v2 data — tokenize-once arrow dataset, bf16, dynamic padding, step-based eval, best checkpoint by hallucinated-token F1, optional prompt windowing for 4k encoders, trust-remote-code for EuroBERT. Label semantics verified token-identical to HallucinationDataset. index.md training section documents the new path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reworks the code-hallucination data into a request-grounded coding-agent pipeline and ships the resulting dataset (published as the
lettucedetect-code-agentsource ofKRLabsOrg/lettucedetect-code-hallucination).What changed
generate_code_agent_hallucinations.py): gold-answer modes (function/fragment/edit), request-grounded injection (wrong_implementation/unrequested_change/fabricated_api), length cap,--repos/--hall-ids-fileselection, continuous-concurrency runner.code_hallucination/answer_grounding.py): four-tier repository grounding + Context7 third-party signatures so answer references aren't mistaken for fabrications; transient-fetch fix so rate-limit windows aren't cached as permanent misses.reground_code_agent_contexts.pyrepairs grounding in place.map_labelrejects unknown labels at the boundary;build_hf_dataset.pyvalidates every span category.check_context_quality.pyaudit,convert_clean_samples_to_hallucinated.py(class-balance raise),train_span_detector.py(fast HF-Trainer path), Streamlit viewer.provenance.md(construction → audit → repair → 50% balance → test verification).Test-set verification
The code test split is individually reviewed (2,038 → 2,015 retained, 50.3% hallucinated): full first-pass review (92.9% accept), blind second-pass adjudication, and evidence arbitration against the true pre-fix sources. 235 spans tightened, 23 dropped, 5 reclassified clean, 23 removed. Train/validation remain machine-generated with automated gates.