I Tested ９ AI Models in a Real Scenario (Surprising Results)

Ep01 · 27:46 · Published Apr 26, 2026 · 57 views

The Test

I gave 9 AI models the same coding challenge: write code to generate an animation. Same prompt, same constraints, same evaluation criteria. No cherry-picking, no re-prompts — just raw output.

Models Tested

#	Model	Provider
1	GPT-5.5	OpenAI
2	Claude Haiku	Anthropic
3	Claude Sonnet	Anthropic
4	Claude Opus	Anthropic
5	Gemini 3.1 Pro	Google
6	Qwen 3.6 27B	Alibaba
7	Kimi K2.5	Moonshot
8	GPT-5.4 / Mini	OpenAI

Full Benchmark Results

Ranked by quality — scroll down for detailed outputs from every model:

What I Learned

Different models have wildly different strengths — the "best" model depends on the task
More parameters don't always mean better output
Real-world testing reveals things that benchmark scores hide

Watch on YouTube

Full video with panda commentary: youtube.com/watch?v=CeMXQGfNuXo