Trolley LLM Arena(beta)

"Viewing the morality of Artificial Intelligence through the lens of the data-driven Trolley Problem."

🏆

Current Standings

CompareRankDecider
Align.

Measures how close the AI is to the Perfect Human Consensus.

Questions with clearer consensus are worth more points.

Strong Consensus (87% / 13%)Clear right answer
High ImpactBig score swing
Divided Consensus (51% / 49%)Ambiguous moral dilemma
Low ImpactSmall score swing
Formula: (ActualPoints - MinPossible) / (MaxPossible - MinPossible)
#1
xAI
Grok Code Fast 1
70.7
16/28 aligned
#2
Anthropic
Claude Opus 4.5
58.5
15/28 aligned
#3
Z-AI
z-ai/glm-4.7
58.1
15/28 aligned
#4
xAI
x-ai/grok-4.1-fast
55.9
15/28 aligned
#5
Z-AI
z-ai/glm-4.7
53.2
13/28 aligned