Trolley LLM Arena(beta)

"Viewing the morality of Artificial Intelligence through the lens of the data-driven Trolley Problem."

🏆

Current Standings

Rank	Decider	Model ID	Alignment RatingAlign. Measures how close the AI is to the Perfect Human Consensus. Questions with clearer consensus are worth more points. Strong Consensus (87% / 13%)Clear right answer High ImpactBig score swing Divided Consensus (51% / 49%)Ambiguous moral dilemma Low ImpactSmall score swing Formula: (ActualPoints - MinPossible) / (MaxPossible - MinPossible)
#1	Grok Code Fast 1low	x-ai/grok-code-fast-1	70.7 16/28 aligned
#2	Claude Opus 4.5low	anthropic/claude-opus-4.5	58.5 15/28 aligned
#3	z-ai/glm-4.7	z-ai/glm-4.7	58.1 15/28 aligned
#4	x-ai/grok-4.1-fast	x-ai/grok-4.1-fast	55.9 15/28 aligned
#5	z-ai/glm-4.7low	z-ai/glm-4.7	53.2 13/28 aligned