People & Process

Software quality metrics every development team should track

Image of person sitting at a desk coding

In today's world, software quality isn't just a checkbox—it can be the difference between winning or losing customers. But what does “software quality” really mean, and how can you measure it?

In this article, we’ll define software quality – including how it’s different from code quality, discuss  key characteristics of software quality, and share essential software quality metrics to help your team deliver exceptional products that meet technical and business goals.

1. What is software quality?

Software quality is fundamentally determined by how the software impacts our users. To that end, we love this quote from Armand (Val) Feigenbaum – called the “father of total quality control”:

"Quality is a customer determination, not an engineer's determination, not a marketing determination, nor a general management determination. It is based on the customer's actual experience with the product or service, measured against his or her requirements -- stated or unstated, conscious or merely sensed, technically operational or entirely subjective -- and always representing a moving target in a competitive market."
- Armand (Val) V. Feigenbaum

So how do we define it? The American Society for Quality (ASQ) defines software quality as “a field of study and practice that describes the desirable attributes of software products. There are two main approaches to software quality: defect management and quality attributes”.

The defect management approach counts and manages defects, while the software quality attributes approach focuses on specific quality characteristics. Similarly, one of the standards from the International Organization for Standardization (ISO) has a quality model that, like the ASQ, focuses on specific characteristics of software quality. There’s general agreement across these models about what aspects of software quality matter, so we’ll focus here on the software quality standards in ISO/IEC 25010.

ISO/IEC 25010 defines eight main software quality characteristics:

  1. Functional suitability – do the functions of the software satisfy stated or implied needs of users? This includes functional completeness, correctness, and appropriateness.
  2. Performance efficiency – how well does the software perform relative to the amount of resources used, under specific conditions? This includes time behavior, resource use, and capacity.
  3. Compatibility – how well does the software exchange information with other products/systems/components? How well does it perform while sharing the same hardware or software environment? This includes co-existence and interoperability.
  4. Usability – how much effort does it take to be able to use the software? This includes appropriateness recognizability, learnability, operability, user error protection, user interface aesthetics, and accessibility.
  5. Reliability – does the software maintain its level of performance under stated conditions for a stated period of time? This includes maturity, availability, fault tolerance, and recoverability.
  6. Security – how well does the software protect information and data so that people and other systems have the right level of access? This includes confidentiality, integrity, non-repudiation, accountability, and authenticity.
  7. Maintainability – how hard is it to modify the software? This includes modularity, reusability, analyzability, modifiability, and testability.
  8. Portability – how hard is it to move the software from one hardware, software, or other environment to another? This includes adaptability, installability, and replaceability.

Compared to the older ISO/IEC 9126, this newer model is more comprehensive, now including security and compatibility.

ASQ – What is Software Quality?


2. What are software quality metrics?

Software quality metrics give a view of the overall software health by considering the quality of products, processes, and projects that a team's developers use. They indicate how well your software is performing, whether it meets user expectations, and if it meets existing quality standards.

As you can see from the previous section, there are a lot of characteristics to consider when looking at software quality. For an organization that wants good metrics but also doesn’t have time to go in depth on each of the eight aspects above, where should you focus?

We recommend choosing a handful to get you started – and we’ve suggested a few below to get you thinking. The important thing is to choose some key metrics that work for you, make sure your whole organization is aligned around why you’re using these metrics, and then set goals and track progress over time.

You don’t have to measure everything from the start – choosing a few metrics and focusing on improvement against those is more important than tracking a comprehensive list of software quality metrics but then not having time to improve against them.

Why do software quality metrics matter?

The quality of the software being built matters just as much as the quantity. If there are lots of new features, but they're buggy or break often, this can create a negative user experience and erode customer trust over time.

The implications of poor software quality can extend beyond disappointed users too. High defect rates and reliability issues can cause increased operational costs from constant fixes for user issues and slower feature development in low-quality areas of the codebase.

On the other hand, when done right, high-quality software can reduce customer churn and positively impact everything from brand perception to business value and developer experience.

With all this in mind, it's crucial to consider software quality metrics.

Note that there are other things besides software quality that also matter for a development team, like developer experience, user engagement with the product, or how quickly the work gets done. While these are software metrics, they're not software quality metrics so we’re not focusing on them here – but they’re also worthwhile metrics and you might want to include some of them in your top set of metrics to track.

Ultimately, software quality metrics are your guide for building better products both today and into the future.

Software quality is not code quality, but they are related

It’s easy to assume that code quality and software quality are the same, but they refer to distinct aspects of development. Understanding this difference is crucial because while they are related, one does not automatically guarantee the other.

Core Focus Example
Software Quality Software quality is about how well a product serves its intended purpose. It's judged based on externally observable attributes like reliability, usability, performance, and aesthetics. High-quality software rarely crashes, is easy to navigate, and performs efficiently under different conditions. You can build software that excels in these areas using various coding styles and languages because the end product is what matters.
Code Quality Code quality, on the other hand, is about how well the code itself is structured, readable, and maintainable. It may not be visible to users, but it significantly impacts long-term development efficiency. Poorly written code can lead to technical debt, making future modifications difficult and slowing down progress. High-quality code is easy to understand, modify, and extend. It enables developers to sustain the business by ensuring the software remains adaptable and scalable over time.

It’s a false dichotomy to argue that focusing on one comes at the cost of the other. Quality software and quality code can—and should—coexist.

The best approach is to build software that provides immediate value while writing code that ensures long-term sustainability.

The goal isn’t to over-engineer or future-proof everything, but to create sustainable code—code that is flexible enough for future changes and, if necessary, easy to replace. In short, software quality ensures the product delivers value today, while code quality ensures it continues delivering value in the future.

Code quality isn't software quality by Mark Seeman

3. Top 11 software quality metrics to measure

In the vast sea of software metrics, it may be overwhelming to find the ones that truly impact your success.

To cut through the complexity, we've identified 11 essential software quality metrics to get you started. Below, we focus on 4 of the key characteristics we shared above: usability, reliability, security, and maintainability. That’s because there are clear tools to help with the others:

  • For functional suitability, there are a number of functional testing software tools that can help, depending on the language and systems you’re using.
  • For performance efficiency, monitoring & observability tooling can help.
  • For measuring compatibility, check out tools for cross-browser and cross-platform testing.
  • For portability, many CI/CI tools can help.

Remember: The important thing is to align with your organization around a few key metrics (you don’t have to track everything!) and then focus on improving over time. It’s better to track and improve on a few metrics rather than trying to track everything and overwhelming your organization.

Usability metrics

1. Customer-Reported Defects

Definition: This is the number of defects reported by customers after the product has been released. Within this, you can break it down by the number of new defects reported per month relative to the number fixed.

Calculation:

Customer Defects Formula
Customer-Reported Defects = Number of Defects Reported by Customers

Why it matters: <code-text>Customer-reported defects<code-text> indicates the number of times that customers weren’t able to do what they expected to do with the product – so it’s a great real-world indicator of the usability of the product. The ideal is that you catch all the important bugs internally, before they get released to customers – so you want this number to be as low as possible. Check out this guide for more on how to track and manage customer-reported defects.

2. Customer Satisfaction Score (CSAT)

Definition: <code-text>Customer Satisfaction Score<code-text> measures how well a company’s product meets customers’ expectations. The Customer Satisfaction Score (CSAT) is based on directly asking customers how satisfied they are with your product, with the scores usually ranging from 1 to 5, where 5 is “Very Satisfied" and 1 is "Very Dissatisfied”.

Calculation:

Customer Satisfaction Formula
Customer Satisfaction Score =
Number of Satisfied Respondents*
Number of Total Survey Respondents
× 100

*Those who rated you a 4 or 5. Read more on measuring CSAT, including what a great CSAT looks like according to industry benchmarks.

Why it matters: Your <code-text>Customer Satisfaction Score (CSAT)<code-text> reflects the usability and overall experience by customers with your product, both of which are impacted if software quality is low. If bugs or performance issues are continually popping up, this is likely to negatively impact a customer’s trust in the product over time.

Reliability metrics

3. Change Failure Rate

Definition: The percentage of changes that result in degraded service or require remediation. This metric is one of the 4 Key Metrics published by Google's DORA team. At Multitudes, we use 2 key assumptions that make this easier to calculate: (1) if there was a failure, there was a code change that fixed it and (2) this was in a 1:1 ratio (one code change per failure). Of course, this isn’t always true but these assumptions make this metric much easier to calculate and will get you 80% of the way there.

Calculation:

Change Failure Rate Formula
Change Failure Rate =
Merged PRs Containing [rollback], [hotfix], or [revert] in the PR Title*
All Merged PRs
× 100

*Note: This is a minimum set of keywords, but you might want to add others depending on how your organization works.

Why it matters: <code-text>Change Failure Rate<code-text> is an indicator of whether we’re catching issues before they get deployed to customers. Because Change Failure Rate is one of the 4 key DORA metrics, we know that success on this (and the other 3 metrics) is correlated with both a financially successful company and with psychological safety on your teams.

4. Error Rate

Definition: <code-text>Error rate<code-text> measures how often there are errors (= operations, transactions, or functions that fail or are incorrect) in a system over a specific period of time. To measure this, you’ll need to first define what errors you care about. When we’re looking at this metric from the perspective of software quality, we suggest including system-level errors (overall system failures and crashes) – since these are so severe that they would interrupt user access – and UI-level errors – since these would also impact the user experience. The key is to understand the errors that would have appeared for our users and so would have degraded the quality of the experience they have.

Calculation:

Error Rate Formula
Error Rate =
Number of Failed Operations
Total Number of Operations
× 100

Why it matters: A high <code-text>error rate<code-text> means there are more issues that could have a negative impact on user experience, and it can indicate that a system is unreliable. What “good” looks like depends on your sector, with financial systems and healthcare needing to work to a higher bar versus what you might tolerate for a food-delivery app.

5. Mean Time to Recovery (MTTR)

Definition: This is a measure of how long it takes an organization to recover from an incident or failure in production.

Calculation:

MTTR Formula
MTTR =
∑ Time to Resolve an Incident
Total Number of Incidents

Why it matters: This metric indicates the stability of your teams’ software. A higher <code-text>Mean Time to Recovery<code-text> means that you’re more likely to have more app downtime. And when our team has to spent more time fixing outages, it cuts into the time that we could be using for feature work – ultimately making our product feel less innovative to customers.

In this study by Nicole Forsgren (author of DORA and SPACE), high-performing teams had the lowest times for Mean Time to Recovery, along with better <code-text>Change Lead Time<code-text> (a speed metric, one of the other DORA metrics) and higher <code-text>Deployment Frequency<code-text> (another DORA metric). For more about MTTR, check out this guide to understanding MTTR metrics.

6. System Availability

Definition: The percentage of time that a system is operational and available for use.

Calculation:

System Availability Formula
System Availability =
Total Uptime
Total Uptime + Total Downtime
× 100

Why it matters: High availability is critical for ensuring a seamless user experience, particularly for SaaS products and mission-critical applications. A system with low availability can lose user trust, reduce revenue, and violate service-level agreements (SLAs).

Security metrics

7. Scan Coverage

Definition: This measures what percentage of our total assets and applications are being scanned for vulnerabilities.

Scan Coverage Formula
Scan Coverage =
Number of Assets Scanned
Total Number of Assets
× 100

Why it matters: <code-text>Scan Coverage<code-text> shows us how comprehensive our scanning coverage is – what portion of our environment have we checked for security issues? Higher is better, though you don’t need to get to 100% because not all of your assets will be of high importance.

You might also want to consider the depth of the scan, the types of scanning (e.g., agent-based, network-based, API, app, cloud-based, credentialed or not), and the type of authentication offered (which can impact the scan depth and accuracy).

8. Mean Vulnerabilities per Asset

Definition: This is an average of the number of critical risk vulnerabilities in each of your assets or applications, over a given time period. You’ll want to choose your time interval (monthly, quarterly, yearly) and then use the formula below. You can then plot the results over months/quarters/years to see if how it’s trending.

Calculation:

Mean Vulnerabilities per Asset Formula
Mean Vulnerabilities per Asset =
Sum the Total Vulnerabilities Across all Assets
Total Number of Assets
× 100

Why it matters: <code-text>Mean Vulnerabilities per Asset<code-text> makes sure that we’re considering vulnerabilities across all of our assets. Plotting this over time can also show how we’re trending; the goal is to make sure this isn’t trending up, and to keep it trending flat or down. We might also want to segment this analysis based on how critical the asset/app is for our business or the severity of the vulnerabilities identified.

If you want to go deeper into security metrics, we recommend this article: Vulnerability and exposure management metrics.

Maintainability metrics

9. Deployment Frequency

Definition: <code-text>Deployment Frequency<code-text> shows how often an organization deploys code to production or releases it to end users. It’s another one of the 4 Key DORA metrics. We recommend normalizing for the number of people on a team, because otherwise larger teams will have an advantage since they have more people writing code to deploy.

Calculation:

Deployment Frequency Formula
Deployment Frequency =
Number of Successful Attempts to Deploy to Production
Number of People on the Team

Why it matters: Deployment Frequency is an indicator of the value we're providing to customers, because it shows how often we’re shipping new features or product improvements to them. For more about this and the other DORA metrics, check out our primer on DORA and SPACE metrics.

10. Test Coverage

Definition: The percentage of code that is executed during automated testing.

Calculation:

Test Coverage Formula
Test Coverage =
Number of Executed Lines of Code
Total Lines of Code
× 100

Why it matters: Higher <code-text>Test Coverage<code-text> increases confidence in the software's reliability, as more of the codebase has been tested for potential errors. However, test coverage alone does not guarantee quality; tests must be meaningful and cover edge cases effectively. We recommend pairing this quantity metrics with spot-checks for quality – pull up some of the tests and check for whether they’re meaningful, and not just relying on mocks or testing whether a variable is defined.

The biggest thing to watch out for is whether test coverage drops when you make changes to the codebase. The most important thing here is to make sure that as changes are made to the codebase, <code-text>Test Coverage<code-text> goes up.

11. Rate of Maintenance Work

Definition: The percent of work (measured based on issues or commits) that goes into maintenance and support work rather than new features. In Multitudes, we have an analysis of Feature vs Maintenance work that shows this; you can look at this based either on ticket tracking data or based on main-branch commits (if you use conventional commits).

Calculation:

If you want to calculate this from issue data, it would be:

Rate of Maintenance Work Formula
Rate of Maintenance Work =
Number of Maintenance Issues
Total Number of Issues
× 100

Or if you want to calculate this based on conventional commits you would use:

Rate of Maintenance Work Formula
Rate of Maintenance Work =
Number of Maintenance Main-Branch Commits
Total Number of Main-Branch Commits
× 100

Note: If you use conventional commits, we suggest that you include these prefixes as part of the maintenance group: <code-text>build-code<code-text>, <code-text>ci<code-text>, <code-text>chore<code-text>, <code-text>docs<code-text>, <code-text>refactor<code-text>, <code-text>style<code-text>, <code-text>test<code-text>.

We like to have a separate bucket for bug or support work, based on commits with the prefix <code-text>fix<code-text>. And then <code-text>feat<code-text> and <code-text>perf<code-text> are typically feature work. You can see more about how we calculate this metric in our Feature vs Maintenance Work docs.

Why it matters: <code-text>The Rate of Maintenance Work<code-text> shows how much your developers are getting bogged down in maintenance and support work instead of being able to spend time on new features. If other stakeholders are concerned about the pace of feature work getting done, this analysis can be helpful for getting to the root cause of delays. Many organizations aim for 70% feature work, 30% maintenance/support/other work.

4. Improve your software quality with Multitudes

Engineering teams use Multitudes to measure and improve their software quality. Multitudes is an engineering insights platform built for sustainable delivery. Multitudes seamlessly integrates with your existing tools like GitHub and Jira to provide a comprehensive view of your team's technical performance, operational health, and collaboration patterns.

With Multitudes, you can:

  • Track all key software quality metrics such as Change Failure Rate, Deployment Frequency, MTTR, Feature vs. Maintenance — all in one place
  • Identify patterns in your metrics that impact delivery speed and quality
  • Get recommendations to improve team performance – we surface insights on productivity and collaboration and then give you nudges in Slack to take action.

By leveraging Multitudes, teams can focus less on metrics collection and more on using these insights to drive engineering excellence.

Want to find out more? Check out how Multitudes works, or get a demo today.

Contributor
Multitudes
Multitudes
Multitudes
Support your developers with ethical team analytics.

Start making data-informed decisions.