How to measure and improve developer productivity | Nicole Forsgren (Microsoft Research, GitHub, Google)

Jul 30, 2023 Episode Page ↗
Overview

Nicole Forsgren, a developer productivity expert and partner at Microsoft Research, discusses measuring and improving engineering team productivity and experience. She details the DORA and SPACE frameworks, provides elite performance benchmarks, and explains how faster delivery enhances quality and stability.

At a Glance
13 Insights
1h 16m Duration
19 Topics
6 Concepts

Deep Dive Analysis

Nicole Forsgren's Diverse Career Background

Distinguishing Developer Productivity, Experience, and DevOps

The DORA Framework for Software Delivery Performance

Benchmarks for Elite Software Delivery Performance

Company Size Does Not Impact DORA Metrics

Improving DevOps Capabilities Through Backward Planning

The SPACE Framework for Measuring Creative Work

Integrating SPACE and DORA Frameworks

Measuring Developer Satisfaction and Well-being

Resources and Tools for Optimizing Metrics

Nicole's Upcoming Book on Developer Experience Measurement

Common Pitfalls in Developer Productivity Initiatives

Evolution and Progress in the DevOps Space

Impact of AI on Developer Experience and Productivity

First Steps to Improve Developer Experience

Google as a Model for DevOps Implementation

Importance of Clear Communication in All Work

Nicole’s Four-Box Framework for Data and Relationships

Effective Decision-Making Strategies

Developer Productivity

This concept refers to how much work can be accomplished over time, emphasizing a holistic measure that includes community effects and well-being. It aims for sustainability and burnout reduction, rather than just brute-force output.

Developer Experience (DevEx)

DevEx describes what it's like to write software, focusing on creating a friction-free, predictable, and certain process for developers. The goal is to reduce uncertainty and increase predictability, directly contributing to overall productivity.

DevOps

DevOps encompasses a set of technical, architectural, and cultural capabilities, along with lean management practices, designed to improve end-to-end software development and delivery. Its purpose is to make the process faster and more reliable, representing a holistic approach beyond just a toolchain.

DORA Framework (Four Keys)

DORA is a research program that identifies four key metrics for software delivery performance: lead time, deployment frequency (speed metrics), mean time to restore (MTTR), and change fail rate (stability metrics). This framework highlights that speed and stability move in tandem, meaning faster delivery often leads to more stable systems.

SPACE Framework

SPACE is a framework for measuring complex creative work, such as developer productivity, across five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow. It recommends selecting at least three dimensions to ensure a balanced and comprehensive measurement approach.

Four-Box Framework

This framework is a method for clarifying hypotheses and measurement strategies by visually mapping 'words' (hypothesized relationships) to 'data' (how those relationships will be measured). It helps identify whether issues lie in the initial hypothesis or in the data collection and quality.

?
How can engineering teams move faster and improve quality simultaneously?

Teams can achieve both speed and stability by implementing good technical practices like automated testing, continuous integration/continuous deployment (CI/CD), and trunk-based development, along with sound architectural practices such as loosely coupled systems.

?
Does company size impact developer productivity benchmarks?

No, statistical analysis has shown no significant difference in developer productivity benchmarks between small and large companies, with the retail industry being a rare outlier due to intense market pressures.

?
How do the DORA and SPACE frameworks work together?

DORA provides specific, measurable metrics for software delivery performance, which can be seen as an implementation of certain SPACE dimensions (Performance and Efficiency). SPACE offers a broader framework to help teams select balanced metrics for any complex creative work, guiding how to measure improvements identified by DORA.

?
Why is measuring developer satisfaction and well-being important for productivity?

Developer satisfaction and well-being are crucial because they are highly correlated with other dimensions of productivity and overall performance. When these aspects decline, it often signals that other parts of the system or team health are starting to break down.

?
What are common pitfalls when implementing developer productivity initiatives?

Common pitfalls include a lack of clarity on the problem or goal being addressed, and failing to secure both top-down leadership buy-in and bottom-up team engagement, which can lead to misaligned efforts and limited success.

?
What is the impact of AI on developer productivity and experience?

AI tools fundamentally change how developers work, shifting their focus from primarily writing code to spending more time reviewing AI-generated code. This alters mental models, friction points, and cognitive load, introducing new considerations for reliance and learning.

1. Define Problem Clearly

Before tackling any initiative, clearly define your problem or goal to ensure everyone is on the same page and working towards the same objective, preventing misaligned efforts.

2. Ship Smaller Changes Frequently

To improve both speed and stability, ship smaller changes more often; this reduces the blast radius of errors and makes debugging significantly easier and faster.

3. Track DORA Four Metrics

Measure your team’s software delivery performance using the DORA four key metrics: lead time for changes, deployment frequency, mean time to restore (MTTR), and change fail rate, as these predict both speed and stability.

4. Apply SPACE for Productivity Measurement

When measuring complex creative work like developer productivity, use the SPACE framework (Satisfaction & Well-being, Performance, Activity, Communication & Collaboration, Efficiency & Flow) by selecting at least three dimensions to ensure a balanced and holistic view.

5. Combine People & System Data

Complement system-generated data (quantitative) with insights from people (qualitative, e.g., surveys, interviews) to gain a comprehensive understanding of productivity, as each provides unique perspectives that the other cannot.

6. Aim for Elite DORA Benchmarks

Strive for elite performance benchmarks: deploy on demand, achieve lead time for changes under one day, restore services in less than an hour, and maintain a change fail rate between 0-15% to gauge your team’s top-tier efficiency and reliability.

7. Adopt DevOps Capabilities

To achieve fast and stable feature delivery, implement key DevOps capabilities such as automated testing, continuous integration/continuous deployment (CI/CD), trunk-based development, and loosely coupled architectural systems.

8. Use Dora.dev Quick Check

Visit Dora.dev’s quick check to benchmark your team’s performance and identify statistical constraints specific to your industry and performance profile, guiding your improvement efforts.

9. Query Developers on Barriers

Directly ask developers about their feelings regarding work tools and processes, and identify their biggest barriers to productivity, as their insights are crucial for targeted improvements.

10. Utilize Four Box Framework

When forming a hypothesis or planning to measure something, use the Four Box Framework: define your hypothesis in “words” (two boxes linked by an arrow), then define how you’ll measure each part with “data” (two corresponding boxes below), ensuring clarity and testability.

11. Communicate Work Accessibly

Make your work incredibly accessible by understanding your audience’s role and vocabulary, then translate your work into concise summaries (a few sentences or less) to ensure clear communication and resonance.

12. Use Decision-Making Spreadsheet

To make informed decisions, create a spreadsheet outlining options, defining important criteria, assigning relative weights (summing to 100%), and scoring each option against the criteria, then multiply to get a weighted score.

13. Align Top-Down & Bottom-Up

Drive success for initiatives by pursuing them with both top-down leadership support and bottom-up buy-in from individual contributors, ensuring good communication throughout the organization.

When you move faster, you are also more stable.

Nicole Forsgren

The key to having a good strategy is knowing what not to do, and the key to executing a good strategy is actually not doing it.

Nicole Forsgren

It's not just about what it is that you build. It's about creating absolutely novel, incredibly new experiences and doing them at a speed that no one has seen before.

Nicole Forsgren

If there is ever a disagreement between the surveys and the instrumentation, which is incredibly advanced. Almost every time, every time that I've ever heard of the surveys are correct and not the instrumentation.

Nicole Forsgren

80% of the folks that I work with, this is their biggest problem, even at like executive levels... not being clear or not understanding what it is that they're looking for.

Nicole Forsgren

Four-Box Framework for Hypothesis Testing and Measurement

Nicole Forsgren
  1. Draw four boxes: two on top, two on bottom, aligned.
  2. Label the left side 'words' (for the top two boxes) and 'data' (for the bottom two boxes).
  3. Draw an arrow between the two top boxes and between the two bottom boxes.
  4. In the first top box, write your initial concept or independent variable (e.g., 'customer satisfaction').
  5. In the second top box, write the expected outcome or dependent variable (e.g., 'return customers').
  6. Validate the 'words' with stakeholders: Ask if they agree with the hypothesized relationship.
  7. In the first bottom box, define how you will measure the first concept with available data (e.g., 'CSAT score,' 'NPS score').
  8. In the second bottom box, define how you will measure the second concept with available data (e.g., 'return customers through website,' 'referral link usage').
  9. Run data analysis (e.g., correlations) to test the relationship between the data points.
  10. If results are unexpected, evaluate if the data quality is poor, if the proxies were bad, or if the initial 'words' (hypothesis) were incorrect.

Decision-Making Process

Nicole Forsgren
  1. Clearly define your objectives and definitions for the decision.
  2. Outline all available options.
  3. Identify the criteria that are important for making the decision (e.g., total compensation, work-life balance, proximity to airport).
  4. Assign a relative weight or importance (e.g., summing to 100%) to each criterion.
  5. Score each option against each criterion.
  6. Multiply the score by the weight for each criterion and sum to get a total score for each option.
  7. Review the calculated scores; this often clarifies the decision, allowing you to be data-informed rather than strictly data-driven.
  8. If making strategic business decisions, use this process to identify what not to do and commit to not funding those options.
80%
Percentage of people struggling with unclear goals for initiatives Nicole Forsgren observes that 80% of individuals she works with, including executives, struggle with clearly defining their problem or goal.
on demand
Elite deployment frequency Elite performers can deploy code as frequently as needed, on demand.
less than a day
Elite lead time for changes (code committed to production) Elite performers achieve a lead time of less than one day for changes to go from code commit to production.
less than an hour
Elite mean time to restore service (MTTR) Elite performers can restore service in under an hour after an incident occurs.
0% and 15%
Elite change fail rate Elite performers experience a change fail rate (percentage of changes causing incidents) between 0% and 15%.
between a day and a week
High lead time for changes (code committed to production) High performers have a lead time for changes that falls between one day and one week.
50%
Time saved on certain tasks using AI tools Research indicates that specific tasks, such as building an HTTP server, can be completed 50% faster with the aid of AI tools.
about 50%
Time spent reviewing code when using AI tools When developers use AI-enabled tools like GitHub Copilot, approximately 50% of their time is spent reviewing generated code rather than writing it.