Skip to content
@biological-alignment-benchmarks

Biological and Economical Alignment Benchmarks

Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions.

👋 We are an AI alignment research collective investigating how fundamental principles from biology and economics can inform safer, more aligned AI systems.

Our work centres on homeostasis, multi-objective balancing, sustainability, and universal human values — drawing from nature's time-tested strategies for maintaining equilibrium — to develop benchmarks that expose dangerous failure modes in current AI approaches.

We also research frameworks that mitigate these risks. We believe that shifting AI design from "maximise forever" toward "maintain a healthy equilibrium" is a crucial and underexplored part of the alignment solution space.

Research Interests

  • Alignment with fundamental biological & economical principles
  • Homeostatic bounded objectives
  • Multi-objective balancing (bounded & unbounded objectives)
  • Concave utility functions
  • Universal human values
  • Runaway conditions — benchmarking & mitigation
  • Multi-objective multi-agent extended gridworlds
  • Sustainability
  • Proactive horizon scanning of side effects
  • Accountability mechanisms and whitelisting

Pinned Loading

  1. biological-alignment-gridagents-benchmarks biological-alignment-gridagents-benchmarks Public

    Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions. The benchmarks are implemented in a gridworld-based environment…

    Python 8 5

  2. ai-safety-gridworlds ai-safety-gridworlds Public

    Forked from google-deepmind/ai-safety-gridworlds

    Extended, multi-agent, and multi-objective (MaMoRL / MoMaRL) gridworld environments building framework based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environmen…

    Python 12 1

  3. bioblue bioblue Public

    Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLM-s with simplified navigation-free observation format. The benchmark themes …

    Python 4 3

  4. zoo_to_gym_multiagent_adapter zoo_to_gym_multiagent_adapter Public

    Enables you to convert a PettingZoo environment to a Gym environment while supporting multiple agents (MARL). Gym's default setup doesn't easily support multi-agent environments, but this wrapper r…

    Python 2 1

Repositories

Showing 5 of 5 repositories

Top languages

Loading…

Most used topics

Loading…