BlogWriting about AI model evaluation, trust, and the tools to measure both.2026-04-13AI Polls Aren’t Real Polls. That’s Exactly Why They Matter.Nate Silver is right: AI polls are fake polls. But stopping there misses something important. Reframed correctly, AI simulations become powerful diagnostic tools for bias detection and model comparison.Colin Smillie4 min read2026-03-27Why AI Models Disagree (And Why It Matters)When you ask GPT-4 and Claude the same question, they often give different answers. Understanding why this happens is the first step toward building AI systems you can trust.Colin Smillie6 min read2026-03-27How to Evaluate AI Models for Enterprise UseGeneric benchmarks tell you how a model performs on average. Enterprise deployment requires knowing how it performs on your specific problems.Colin Smillie8 min read2026-03-27The Case for Structured AI BenchmarkingAd hoc testing feels productive but produces unreliable conclusions. Structured benchmarking with defined question types gives you data you can act on.Colin Smillie7 min read
2026-04-13AI Polls Aren’t Real Polls. That’s Exactly Why They Matter.Nate Silver is right: AI polls are fake polls. But stopping there misses something important. Reframed correctly, AI simulations become powerful diagnostic tools for bias detection and model comparison.Colin Smillie4 min read
2026-03-27Why AI Models Disagree (And Why It Matters)When you ask GPT-4 and Claude the same question, they often give different answers. Understanding why this happens is the first step toward building AI systems you can trust.Colin Smillie6 min read
2026-03-27How to Evaluate AI Models for Enterprise UseGeneric benchmarks tell you how a model performs on average. Enterprise deployment requires knowing how it performs on your specific problems.Colin Smillie8 min read
2026-03-27The Case for Structured AI BenchmarkingAd hoc testing feels productive but produces unreliable conclusions. Structured benchmarking with defined question types gives you data you can act on.Colin Smillie7 min read