Over the past eighteen months, we've shipped fourteen AI systems into production. Some of them save teams hours a week. Some of them are used once a month. A few quietly got turned off after six weeks — and those taught us the most.
Lesson one: if you can't measure the lift, you can't ship it. Every engagement now starts by locking in a single business metric and a target number. No metric, no project.
Lesson two: the hard part isn't the model. It's the plumbing around the model — data pipelines, eval harnesses, human review queues, fallback behavior when the model is confidently wrong.
Lesson three: start small on purpose. A narrow, unglamorous workflow that saves five hours a week beats a flashy demo that drifts into irrelevance after launch.
Lesson four: tell the truth about cost. Tokens aren't free, latency compounds, and the cheapest model that passes evals usually wins over the newest SOTA one.