Use Data to Drive Decisions

At every stage of a project, we should measure how well our service is working for our users. This includes measuring how well a system performs and how people are interacting with it in real-time. Our teams and agency leadership should carefully watch these metrics to find issues and identify which bug fixes and improvements should be prioritized. Along with monitoring tools, a feedback mechanism should be in place for people to report issues directly.

Key Questions

What are the key metrics for the service?

How have these metrics performed over the life of the service?

Which system monitoring tools are in place?

What is the targeted average response time for your service? What percent of requests take more than 1 second, 2 seconds, 4 seconds, and 8 seconds?

What is the average response time and percentile breakdown (percent of requests taking more than 1s, 2s, 4s, and 8s) for the top 10 transactions?

What is the volume of each of your service’s top 10 transactions? What is the percentage of transactions started vs. completed?

What is your service’s monthly uptime target?

What is your service’s monthly uptime percentage, including scheduled maintenance? Excluding scheduled maintenance?

How does your team receive automated alerts when incidents occur?

How does your team respond to incidents? What is your post-mortem process?

Which tools are in place to measure user behavior?

What tools or technologies are used for A/B testing?

How do you measure customer satisfaction?

Checklist

Monitor system-level resource utilization in real time

Monitor system performance in real-time (e.g. response time, latency, throughput, and error rates)

Ensure monitoring can measure median, 95th percentile, and 98th percentile performance

Create automated alerts based on this monitoring

Track concurrent users in real-time, and monitor user behaviors in the aggregate to determine how well the service meets user needs

Publish metrics internally

Publish metrics externally

Use an experimentation tool that supports multivariate testing in production