System 1 vs System 2: Testing LLMs with Riddles
An experimental evaluation of how 8 models (6 cloud, 2 local) perform on logic puzzles, revealing the gap between pattern matching and first-principles reasoning. Includes complete raw model responses.
🎯 Try Interactive Challenge → Read Full Analysis → 📝 Raw Outputs →