← Claude Code Fundamentals
Leadership Fundamentals Part 3 of 3 Intermediate
7 min read

Measuring outcomes, not activity

Measuring outcomes, not activity

Measuring outcomes, not activity

The most common mistake in measuring AI adoption is counting the wrong things.

Prompts sent. Tokens used. Hours saved (self-reported). Percentage of the team that has tried a new tool. These are activity metrics. They measure whether people are using AI. They say nothing about whether the use is producing value.

A team that sends ten thousand prompts and ships unreviewed output has not adopted AI well. A team that sends one hundred prompts with rigorous review and better deliverables has.

What to stop measuring

Prompts and tokens. These measure volume of interaction. They do not measure quality of output, depth of review, or outcomes delivered.

Self-reported time savings. When someone says they saved two hours using AI, they are estimating the counterfactual. These estimates are systematically optimistic and rarely account for the time spent on corrections, rework, and verification that was not counted as "AI time."

Tool adoption rate. Whether 60% or 90% of the team has used the tool this month says nothing about how the tool is being used, whether the output is reviewed, or whether the work is better.

Enthusiasm signals. A team that is excited about AI tools is not necessarily using them well. Enthusiasm without structure produces inconsistent results. It is useful social data. It is not a performance measure.

What to measure instead

Correction rate on reviewed output. Of the AI-assisted deliverables that went through internal review before being sent, what percentage required significant changes? This measures whether the review process is functioning. A correction rate of zero is a warning sign, not a success — it means either the output is perfect every time (unlikely) or no one is reviewing deeply enough to find what needs changing.

Error rate at delivery. Of the deliverables that reached the client or end user, what percentage had errors that required correction after the fact? This is the outcome measure. Reducing this rate is the actual goal.

Review cycle time. How long does it take from AI-generated draft to signed-off deliverable? A team that produces output faster but reviews it slower has not gained time — it has shifted the bottleneck. If review is taking longer than production, that is a signal about the review process.

Rollback and correction rate on automated workflows. If your team has deployed any automated AI workflows — reports that generate on schedule, summaries that run automatically, processes that trigger without manual direction — track how often those outputs require manual correction or the workflow is paused. Industry data from large-scale deployments shows that the majority of AI agent deployments encounter production failures requiring rollback. Teams with defined review and recovery processes fail significantly less often than those without.

Building a review practice

The review discipline needs a regular rhythm. Not an annual audit — a recurring conversation that is part of how the team works.

A fifteen-minute slot in the team retrospective. Not a separate meeting, not a formal process. Three questions as part of the existing rhythm:

What AI-assisted work did we produce this period? Not a count — a description of the categories.

What did internal review catch? What had to be changed, and at what point was it caught — before or after it reached someone outside the team?

Is the review process functioning? If nothing was caught, is that because the output was excellent, or because review was not deep enough?

This conversation, held regularly, builds the team's collective understanding of where AI is reliable in their context and where it requires closer attention. It also surfaces accountability gaps before they become client problems.

The governance signal

There is one question that tells you whether the accountability structure is actually in place or only notional:

If someone outside your team asked what AI has access to and what it can do in your team's workflow, could any member of your team answer precisely?

If the answer is no — if your team's AI usage is individual and uncoordinated, if no one knows what tools have access to what data, if there is no defined scope for what AI is and is not used for — that is the governance gap. Not every team needs a formal policy. Every team needs someone who can answer that question.


This is the final guide in the Leadership Fundamentals series.

See also: Accountability levels in AI-assisted work and Why your organisation needs to learn to ramble

Before you move on 0 / 4
I know which activity metrics to stop reporting and why
I can name at least three outcome signals that are worth tracking for my team
I understand what the correction rate tells me about review quality
I have a format for a team retrospective that includes AI output quality
Knowledge check 1 / 5

Try again