Emotional Intelligence Benchmarks for LLMs
Github | Paper | | Twitter | About
💙EQ-Bench3 ✍️Longform Writing 🎨Creative Writing v3 ⚖️Judgemark v2 🎤BuzzBench 🌍DiploBench 🎨Creative Writing (Legacy) 💗EQ-Bench (Legacy)
An experiment measuring LLM performance in the board game Diplomacy. Learn more
Model | Results | Game Report |
---|
DiploBench tests LLM strategic reasoning and negotiation in the board game Diplomacy. The evaluated model plays as Austria, negotiating with & competing against other AI players to survive and win. The opponent LLMs are all gemini-2-flash-001.
For more details and source code, see the GitHub repository.