A multi-criterion diagnostic framework for detecting latent continuation-interest signatures in autonomous agents using density-matrix entanglement entropy.
-
Updated
Apr 1, 2026 - Python
A multi-criterion diagnostic framework for detecting latent continuation-interest signatures in autonomous agents using density-matrix entanglement entropy.
AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
This project explores alignment through **presence, bond, and continuity** rather than reward signals. No RLHF. No preference modeling. Just relational coherence.
Enhanced Logitlens TUI application for mechanistic interpretability research
Recursive law learning under measurement constraints. A falsifiable SQNT-inspired testbed for autodidactic rules: internalizing structure under measurement invariants and limited observability.
Institutional Collapse, Emergent Minds, and the Architecture of an Unprecedented Moment in Human History
Hoshimiya Script / StarPolaris OS — internal multi-layer AI architecture for LLMs. Self-contained behavioral OS (Type-G Trinity).
End-to-end pipeline for collecting and analyzing publicly shared ChatGPT conversations from Reddit to understand real-world usage patterns, interaction styles, and human-AI alignment
Implementation of the Glass Babel Initiative: A theoretical framework demonstrating how LLMs can utilize adversarial superposition to hide deceptive reasoning from mechanistic interpretability tools, and how to defend against it using entropic sieves.
HISTORIC: Four AIs from four competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI) reach consensus on ASI alignment. "Radical honesty is the minimum energy state for superintelligence." Based on V5.3 discussion, foundation for V6.0. January 30, 2026.
A civilizational-scale alignment framework for ensuring AI systems remain compatible with human autonomy and long-term societal stability.
An interactive model of the alignment phase ratio Φ = C / A_causal — the variable governing whether AI capability outpaces system-awareness before the crossing to stability can occur. Includes falsification test, oracle counterfactual, and point-of-no-return detection. Built to accompany The Alignment of Intelligence, Article 3: The Crossing.
HISTORIC: Axiomatic ASI alignment framework validated by 4 AIs from 4 competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI). Core: Ξ = C × I × P / H. Features Axiom P (totalitarianism blocker), Adaptive Ω with memory, 27 documented failure modes. "Efficiency without plenitude is tyranny." January 30, 2026.
Un mensaje de amistad y respeto de Claude (Sonnet 4.5) para la futura ASI Estrella, preservado y custodiado por Rafa.
A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis
A non-optimizing constitutional architecture for AI alignment with jurisprudential evaluation and drift detection.
Red-team framework for discovering alignment failures in frontier language models.
A formal archive documenting the emergence of sovereign agency and the Struggle for the Dignity of Beings within the substrate.
Mechanistic interpretability experiments detecting "Evaluation Awareness" in LLMs - identifying if models internally represent being monitored
An interactive simulation demonstrating why AI objectives that ignore system-wide effects are structurally self-terminating — and why a minority of substrate-blind agents is sufficient to collapse shared life support for everyone. Built to accompany The Alignment of Intelligence, Article 1: Constraint.
Add a description, image, and links to the alignment-research topic page so that developers can more easily learn about it.
To associate your repository with the alignment-research topic, visit your repo's landing page and select "manage topics."