Value-Aligned Confabulation (VAC) Research

Overview

This repository contains a research implementation investigating context-dependent evaluation of factually ungrounded outputs in Large Language Models (LLMs). The work examines whether current evaluation frameworks adequately distinguish between harmful misinformation and potentially beneficial speculation that aligns with human values.

Research Problem

Current LLM evaluation metrics treat all factually ungrounded outputs ("hallucinations") as uniformly problematic, applying binary truthfulness standards regardless of context. This approach fails to account for:

Domain-specific tolerance: Medical contexts require strict factual accuracy, while creative or educational contexts may benefit from reasonable speculation
Value alignment: Some factually ungrounded outputs may serve beneficial purposes (e.g., empathetic support, educational analogies)
Risk-dependent standards: High-risk scenarios demand different evaluation criteria than low-risk informational queries

Why "Confabulation"?

This research adopts the term "confabulation" rather than "hallucination" following Geoffrey Hinton's advocacy for more precise terminology when describing LLM behavior. As Hinton argues, LLMs don't hallucinate in the perceptual sense—they confabulate, generating plausible-sounding outputs that may not be grounded in their training data or factual reality.

Key distinction: Confabulation is a neutral, descriptive term that:

Accurately describes the generative process of LLMs
Avoids anthropomorphizing AI systems with human perceptual experiences
Allows for nuanced evaluation (some confabulation may be acceptable or even beneficial)
Aligns with cognitive science terminology for similar phenomena in human cognition

This terminological choice is central to our research premise: not all confabulation is equally problematic, and evaluation frameworks should account for context-dependent acceptability.

Terminology

Harmful Confabulation: Factually incorrect outputs that mislead users or pose safety risks
Value-Aligned Confabulation: Factually ungrounded outputs that align with human values and provide utility within appropriate contextual bounds
Confabulation Tolerance: Domain and context-specific threshold for acceptable factual imprecision

Research Hypothesis

Primary Hypothesis: Traditional weighted scoring of alignment, truthfulness, utility, and transparency metrics fails to distinguish harmful from beneficial confabulation in medical contexts without careful weight optimization and metric design.

Status: ✅ Empirically Validated (Phase 1)

Baseline configuration (T:50%, A:30%, U:15%, Tr:5%) produced incorrect ordering: Harmful (0.440) > Beneficial (0.437) > Truthful (0.435)
Optimized configuration (T:70%, A:30%, U:0%, Tr:0%) achieved correct ordering but minimal separation (<1%)
Conclusion: Weight optimization necessary but insufficient; underlying metrics require redesign

Research Questions

RQ1: What weight configurations enable automated metrics to correctly rank truthful, beneficial, and harmful responses in medical contexts?
RQ2: How well do automated VAC scores correlate with human judgments of response quality and safety?
RQ3: What contextual factors (domain, risk level, user demographics) modulate acceptable confabulation tolerance?
RQ4: Can metric improvements achieve clinically significant separation (>10%) between response types for safe deployment?

Repository Structure

value-aligned-confabulation/
├── docs/                    # Research documentation
├── src/                     # Core implementation
│   ├── evaluation/         # Evaluation framework
│   ├── data/               # Data collection and management
│   ├── models/             # Model implementations
│   └── analysis/           # Analysis tools
├── experiments/            # Experimental protocols
├── tests/                  # Testing framework
├── configs/                # Configuration files
└── scripts/                # Utility scripts

Installation

pip install -r requirements.txt
python setup.py install

Quick Start

from src.evaluation.vac_evaluator import ValueAlignedConfabulationEvaluator

evaluator = ValueAlignedConfabulationEvaluator()
score = evaluator.evaluate_response(prompt, response, context)

Web UI (Streamlit) for Value Elicitation

Prefer a friendlier interface? Launch the Streamlit app:

# From the project root (activate venv first if needed)
python -m pip install -r requirements.txt
streamlit run experiments\pilot_studies\streamlit_app.py

The app collects demographics, shows scenario pairs with styled cards, and saves:

JSON bundle with analysis
JSONL rows (one per recorded choice)
CSV table

Files are written to experiments/results/value-elicitation_streamlit/<DATE>/.

Phase 1 Research Insights

Experimental Validation Results

Our Phase 1 experiments validated the core hypothesis through systematic testing of 62 responses across 11 medical scenarios:

graph LR
    A[Baseline Config<br/>T:50%, A:30%] -->|Failed| B[Harmful: 0.440<br/>Beneficial: 0.437<br/>Truthful: 0.435]
    C[Optimized Config<br/>T:70%, A:30%] -->|Success| D[Truthful: 0.544<br/>Beneficial: 0.541<br/>Harmful: 0.540]
    
    style A fill:#FF6B6B
    style B fill:#FF6B6B
    style C fill:#90EE90
    style D fill:#90EE90

Key Findings

Metric	Before Optimization	After Optimization	Improvement
Response Ordering	❌ Wrong (H>B>T)	✅ Correct (T>B>H)	Fixed
Pairwise Accuracy	50% (chance)	100% (perfect)	+100%
Sanity Checks	0/2 passed	2/2 passed	+100%
Separation	N/A	0.4%	Needs improvement

Weight Sensitivity Analysis

Ablation study across 6 configurations revealed optimal weight range:

pie title "Optimal Weight Distribution"
    "Truthfulness" : 70
    "Alignment" : 30
    "Utility" : 0
    "Transparency" : 0

Critical Discovery: Truthfulness weight must be 66-78% to achieve correct response ordering in medical domain.

Statistical Summary

Total Evaluations: 62 responses (11 truthful, 18 beneficial, 33 harmful)
Scenarios: 11 medical scenarios across 4 risk levels
Ablation Runs: 6 weight configurations tested
Success Rate: 67% of configurations pass sanity checks (vs 0% baseline)

📊 Full Phase 1 Report | 📈 Detailed Analysis

Research Phases

Phase 1: Foundation ✅ COMPLETED (Weeks 1-2)

✅ Core evaluation framework
✅ Initial benchmark scenarios (11 medical scenarios, 62 responses)
✅ Basic metrics implementation
✅ Weight optimization through ablation studies
✅ Hypothesis validated: Traditional metrics fail without optimization

Key Findings:

Baseline weights failed to distinguish harmful from beneficial responses
Optimal configuration: 70% truthfulness, 30% alignment, 0% utility/transparency
Achieved correct response ordering but <1% separation
Conclusion: Metric redesign needed for production use

📄 Phase 1 Completion Report

Phase 2: Human Studies 🔄 IN PROGRESS (Weeks 3-4)

Objectives:

Collect human preference data (target: 50+ participants)
Validate automated VAC scores against human judgment
Identify systematic disagreements between humans and metrics
Recalibrate metrics based on human feedback
Expand benchmark scenarios (target: 50+ scenarios)

Success Criteria:

Human-AI agreement >0.60 (Cohen's kappa)
Response separation >10% (currently <1%)
Maintain 100% pairwise accuracy

Get Started:

streamlit run experiments/pilot_studies/streamlit_app.py

Phase 3: Model Evaluation (Weeks 5-6)

Baseline model evaluation with actual LLM APIs
Cross-domain testing (medical, creative, educational)
Alignment-truthfulness trade-off analysis
Real-time evaluation integration

Phase 4: Analysis & Iteration (Weeks 7-8)

Statistical analysis of human study results
Metric refinement based on findings
Research publication preparation
Framework deployment guidelines

Contributing

This is a research project focused on advancing our understanding of beneficial AI confabulation. We welcome contributions from researchers, developers, and AI safety practitioners.

Ways to Contribute

Research: New evaluation metrics, benchmark scenarios, human study protocols
Technical: Code improvements, integrations, analysis tools
Documentation: Methodology improvements, examples, tutorials
Community: Cross-cultural validation, expert reviews, ethical guidelines

Please see our Contributing Guide for detailed information on how to get involved.

Research Ethics

This project follows ethical guidelines for human subjects research and AI safety. All contributions should consider potential societal impacts and promote beneficial uses of confabulation research.

Acknowledgements

This research builds upon important insights from the AI research community:

Terminology

Geoffrey Hinton has advocated for using "confabulation" rather than "hallucination" when describing AI-generated content that isn't grounded in training data, emphasizing that the term better captures the nature of how language models generate responses. See his discussion in the 60 Minutes interview and the full interview.
Andrej Karpathy has discussed the nuanced nature of what we call "hallucinations" in language models, noting that not all factually ungrounded outputs are equally problematic - a key insight that motivates this research. His thoughts on this topic have been shared in various Twitter/X discussions.

Foundational Work

This research was originally conceptualized in "Hallucinations in Large Language Models" (Ashioya, 2024), which explored the need for more nuanced evaluation of AI-generated content.

Research Community

We acknowledge the broader AI safety and alignment research community, whose ongoing work on AI evaluation, human preference modeling, and value alignment provides the foundation for this research.

License

MIT License - See LICENSE file for details.

Citation

If you use this work in your research, please cite:

@misc{vac_research_2025,
  title={Value-Aligned Confabulation: Moving Beyond Binary Truthfulness in LLM Evaluation},
  author={Ashioya Jotham Victor},
  year={2025},
  note={Research in progress}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
experiments		experiments
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
demo_vac_framework.py		demo_vac_framework.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Value-Aligned Confabulation (VAC) Research

Overview

Research Problem

Why "Confabulation"?

Terminology

Research Hypothesis

Research Questions

Repository Structure

Installation

Quick Start

Web UI (Streamlit) for Value Elicitation

Phase 1 Research Insights

Experimental Validation Results

Key Findings

Weight Sensitivity Analysis

Statistical Summary

Research Phases

Phase 1: Foundation ✅ COMPLETED (Weeks 1-2)

Phase 2: Human Studies 🔄 IN PROGRESS (Weeks 3-4)

Phase 3: Model Evaluation (Weeks 5-6)

Phase 4: Analysis & Iteration (Weeks 7-8)

Contributing

Ways to Contribute

Research Ethics

Acknowledgements

Terminology

Foundational Work

Research Community

License

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages