Measuring Team Performance

What measures of performance do you use? Which metrics do you need to track? How do you present performance data to the broader organization? How do your performance measurements inform your project execution? How does AI adoption affect your approach to performance?

Focus on Outcomes #

There is hardly any disagreement that the impact of AI on engineering productivity results in even more challenges when measuring team performance. A room full of technical executives (1)From domains ranging from cloud infrastructure to fintech, investment funds, health, consumer experiences, and gaming expressed preference for outcomes-based metrics (like the DORA (2)DORA consists of: 1) Deployment Frequency: How often is code deployed to production; 2) Lead Time for Changes: The time it takes for code change to reach production; 3) Change Failure Rate: How often a deployment causes a failure in production; 4) Time to Restore Service: How quickly a service can be restored after an outage or failure. framework) over traditional productivity measures like velocity or story points completed. For example, pre-AI developer productivity metrics can now be easily gamed or locally optimized. The trade-offs (and assumptions made) when measuring team performance range between:

How important are organizational culture and cross-training for team sustainability, especially amid changing team compositions? Can they be measured? Are there too few or too many metrics addressing individual and team productivity? Can selecting the right tool and metrics make the difference in team performance?

By far, AI-enabled software development, and its measurement, is resulting in more efforts among technical leadership to:

For example, there is now major difficulty in establishing pre-AI baselines for team performance. There are more risks of local optimization (improving one team while creating bottlenecks for others). Error traceability has become harder with AI code generation due to a shared responsibility without direct ownership.

Metrics Get Gamed #

Organizations should not embrace narrow metrics like velocity and code generation speed. In fact, not a single participant in the discussion mentioned lines of code (LOC) or “git diffs” as performance metrics of interest. Instead, focusing on global flow metrics can achieve meaningful business outcomes. Narrow metrics can be gamed and abused. 33% of participants vocally supported DORA metrics, which typically expand conversations about collective improvement, rather than punishment for under-performance. Teams should be empowered to define their own metrics. Cross-team comparisons should be avoided. In fact, individual performance reviews can undermine team metrics.

For example, Google’s product graveyard (3)KilledByGoogle.com lists a total of 298 products discontinued by Google. demonstrates specific teams’ promotion incentives, as all metrics eventually degrade and regress towards a baseline. Thus, measuring outcomes over outputs becomes paramount. Implementing DORA at the service level, rather than team level, can prevent this local optimization and abuse. Amid a changing team composition, another 33% of technical leaders find cross-training, succession planning, and T-shaped skill development mandatory. This prevents individual contributors from becoming irreplaceable and causing unwanted layers of consequences.

Implementing Software Engineering Intelligence (SEI) (4)E.g. Tools like LinearB, Jellyfish or frameworks like SPACE, SRE, SRF. measurement tools can create:

What About Culture? #

In fact, as agreed by ~25% of technical leaders participating, team behavior changes must precede any tool or metric adoption. Fundamentally, engineering teams view AI differently than business teams do, so the metrics should not be common. Business teams can measure team health and culture through turnover rates and monitoring surveys (5)Amazon has instituted a daily pulse check for its employees. Online platforms like Blind share anonymous feedback from employees. . According to ~20% of technical leaders, culture should not be explicitly defined or measured. It naturally emerges from good management with clear incentives. Culture manifests from exhibited behaviors, rather than stated values. In fact, 15% claim defining values post-hiring is counterproductive. 20% of leaders question the value of measuring the speed of functionality delivered, with consensus that outcomes matter more than productivity metrics. Predictable, incremental value delivery is fundamentally more important than raw speed of contributions.


Kliment Minchev is an engineer and technology founder.


Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image