Blog
Writing about AI model evaluation, trust, and the tools to measure both.
AI Polls Aren’t Real Polls. That’s Exactly Why They Matter.
Nate Silver is right: AI polls are fake polls. But stopping there misses something important. Reframed correctly, AI simulations become powerful diagnostic tools for bias detection and model comparison.
Colin Smillie4 min read
Why AI Models Disagree (And Why It Matters)
When you ask GPT-4 and Claude the same question, they often give different answers. Understanding why this happens is the first step toward building AI systems you can trust.
Colin Smillie6 min read
How to Evaluate AI Models for Enterprise Use
Generic benchmarks tell you how a model performs on average. Enterprise deployment requires knowing how it performs on your specific problems.
Colin Smillie8 min read
The Case for Structured AI Benchmarking
Ad hoc testing feels productive but produces unreliable conclusions. Structured benchmarking with defined question types gives you data you can act on.
Colin Smillie7 min read