In today's world, software quality isn't just a checkbox—it can be the difference between winning or losing customers. But what does “software quality” really mean, and how can you measure it?
In this article, we’ll define software quality – including how it’s different from code quality, discuss key characteristics of software quality, and share essential software quality metrics to help your team deliver exceptional products that meet technical and business goals.
Software quality is fundamentally determined by how the software impacts our users. To that end, we love this quote from Armand (Val) Feigenbaum – called the “father of total quality control”:
"Quality is a customer determination, not an engineer's determination, not a marketing determination, nor a general management determination. It is based on the customer's actual experience with the product or service, measured against his or her requirements -- stated or unstated, conscious or merely sensed, technically operational or entirely subjective -- and always representing a moving target in a competitive market."
- Armand (Val) V. Feigenbaum
So how do we define it? The American Society for Quality (ASQ) defines software quality as “a field of study and practice that describes the desirable attributes of software products. There are two main approaches to software quality: defect management and quality attributes”.
The defect management approach counts and manages defects, while the software quality attributes approach focuses on specific quality characteristics. Similarly, one of the standards from the International Organization for Standardization (ISO) has a quality model that, like the ASQ, focuses on specific characteristics of software quality. There’s general agreement across these models about what aspects of software quality matter, so we’ll focus here on the software quality standards in ISO/IEC 25010.
ISO/IEC 25010 defines eight main software quality characteristics:
Compared to the older ISO/IEC 9126, this newer model is more comprehensive, now including security and compatibility.
Software quality metrics give a view of the overall software health by considering the quality of products, processes, and projects that a team's developers use. They indicate how well your software is performing, whether it meets user expectations, and if it meets existing quality standards.
As you can see from the previous section, there are a lot of characteristics to consider when looking at software quality. For an organization that wants good metrics but also doesn’t have time to go in depth on each of the eight aspects above, where should you focus?
We recommend choosing a handful to get you started – and we’ve suggested a few below to get you thinking. The important thing is to choose some key metrics that work for you, make sure your whole organization is aligned around why you’re using these metrics, and then set goals and track progress over time.
You don’t have to measure everything from the start – choosing a few metrics and focusing on improvement against those is more important than tracking a comprehensive list of software quality metrics but then not having time to improve against them.
The quality of the software being built matters just as much as the quantity. If there are lots of new features, but they're buggy or break often, this can create a negative user experience and erode customer trust over time.
The implications of poor software quality can extend beyond disappointed users too. High defect rates and reliability issues can cause increased operational costs from constant fixes for user issues and slower feature development in low-quality areas of the codebase.
On the other hand, when done right, high-quality software can reduce customer churn and positively impact everything from brand perception to business value and developer experience.
With all this in mind, it's crucial to consider software quality metrics.
Note that there are other things besides software quality that also matter for a development team, like developer experience, user engagement with the product, or how quickly the work gets done. While these are software metrics, they're not software quality metrics so we’re not focusing on them here – but they’re also worthwhile metrics and you might want to include some of them in your top set of metrics to track.
Ultimately, software quality metrics are your guide for building better products both today and into the future.
It’s easy to assume that code quality and software quality are the same, but they refer to distinct aspects of development. Understanding this difference is crucial because while they are related, one does not automatically guarantee the other.
It’s a false dichotomy to argue that focusing on one comes at the cost of the other. Quality software and quality code can—and should—coexist.
The best approach is to build software that provides immediate value while writing code that ensures long-term sustainability.
The goal isn’t to over-engineer or future-proof everything, but to create sustainable code—code that is flexible enough for future changes and, if necessary, easy to replace. In short, software quality ensures the product delivers value today, while code quality ensures it continues delivering value in the future.
In the vast sea of software metrics, it may be overwhelming to find the ones that truly impact your success.
To cut through the complexity, we've identified 11 essential software quality metrics to get you started. Below, we focus on 4 of the key characteristics we shared above: usability, reliability, security, and maintainability. That’s because there are clear tools to help with the others:
Remember: The important thing is to align with your organization around a few key metrics (you don’t have to track everything!) and then focus on improving over time. It’s better to track and improve on a few metrics rather than trying to track everything and overwhelming your organization.
Definition: This is the number of defects reported by customers after the product has been released. Within this, you can break it down by the number of new defects reported per month relative to the number fixed.
Calculation:
Why it matters: <code-text>Customer-reported defects<code-text> indicates the number of times that customers weren’t able to do what they expected to do with the product – so it’s a great real-world indicator of the usability of the product. The ideal is that you catch all the important bugs internally, before they get released to customers – so you want this number to be as low as possible. Check out this guide for more on how to track and manage customer-reported defects.
Definition: <code-text>Customer Satisfaction Score<code-text> measures how well a company’s product meets customers’ expectations. The Customer Satisfaction Score (CSAT) is based on directly asking customers how satisfied they are with your product, with the scores usually ranging from 1 to 5, where 5 is “Very Satisfied" and 1 is "Very Dissatisfied”.
Calculation:
*Those who rated you a 4 or 5. Read more on measuring CSAT, including what a great CSAT looks like according to industry benchmarks.
Why it matters: Your <code-text>Customer Satisfaction Score (CSAT)<code-text> reflects the usability and overall experience by customers with your product, both of which are impacted if software quality is low. If bugs or performance issues are continually popping up, this is likely to negatively impact a customer’s trust in the product over time.
Definition: The percentage of changes that result in degraded service or require remediation. This metric is one of the 4 Key Metrics published by Google's DORA team. At Multitudes, we use 2 key assumptions that make this easier to calculate: (1) if there was a failure, there was a code change that fixed it and (2) this was in a 1:1 ratio (one code change per failure). Of course, this isn’t always true but these assumptions make this metric much easier to calculate and will get you 80% of the way there.
Calculation:
*Note: This is a minimum set of keywords, but you might want to add others depending on how your organization works.
Why it matters: <code-text>Change Failure Rate<code-text> is an indicator of whether we’re catching issues before they get deployed to customers. Because Change Failure Rate is one of the 4 key DORA metrics, we know that success on this (and the other 3 metrics) is correlated with both a financially successful company and with psychological safety on your teams.
Definition: <code-text>Error rate<code-text> measures how often there are errors (= operations, transactions, or functions that fail or are incorrect) in a system over a specific period of time. To measure this, you’ll need to first define what errors you care about. When we’re looking at this metric from the perspective of software quality, we suggest including system-level errors (overall system failures and crashes) – since these are so severe that they would interrupt user access – and UI-level errors – since these would also impact the user experience. The key is to understand the errors that would have appeared for our users and so would have degraded the quality of the experience they have.
Calculation:
Why it matters: A high <code-text>error rate<code-text> means there are more issues that could have a negative impact on user experience, and it can indicate that a system is unreliable. What “good” looks like depends on your sector, with financial systems and healthcare needing to work to a higher bar versus what you might tolerate for a food-delivery app.
Definition: This is a measure of how long it takes an organization to recover from an incident or failure in production.
Calculation:
Why it matters: This metric indicates the stability of your teams’ software. A higher <code-text>Mean Time to Recovery<code-text> means that you’re more likely to have more app downtime. And when our team has to spent more time fixing outages, it cuts into the time that we could be using for feature work – ultimately making our product feel less innovative to customers.
In this study by Nicole Forsgren (author of DORA and SPACE), high-performing teams had the lowest times for Mean Time to Recovery, along with better <code-text>Change Lead Time<code-text> (a speed metric, one of the other DORA metrics) and higher <code-text>Deployment Frequency<code-text> (another DORA metric). For more about MTTR, check out this guide to understanding MTTR metrics.
Definition: The percentage of time that a system is operational and available for use.
Calculation:
Why it matters: High availability is critical for ensuring a seamless user experience, particularly for SaaS products and mission-critical applications. A system with low availability can lose user trust, reduce revenue, and violate service-level agreements (SLAs).
Definition: This measures what percentage of our total assets and applications are being scanned for vulnerabilities.
Why it matters: <code-text>Scan Coverage<code-text> shows us how comprehensive our scanning coverage is – what portion of our environment have we checked for security issues? Higher is better, though you don’t need to get to 100% because not all of your assets will be of high importance.
You might also want to consider the depth of the scan, the types of scanning (e.g., agent-based, network-based, API, app, cloud-based, credentialed or not), and the type of authentication offered (which can impact the scan depth and accuracy).
Definition: This is an average of the number of critical risk vulnerabilities in each of your assets or applications, over a given time period. You’ll want to choose your time interval (monthly, quarterly, yearly) and then use the formula below. You can then plot the results over months/quarters/years to see if how it’s trending.
Calculation:
Why it matters: <code-text>Mean Vulnerabilities per Asset<code-text> makes sure that we’re considering vulnerabilities across all of our assets. Plotting this over time can also show how we’re trending; the goal is to make sure this isn’t trending up, and to keep it trending flat or down. We might also want to segment this analysis based on how critical the asset/app is for our business or the severity of the vulnerabilities identified.
If you want to go deeper into security metrics, we recommend this article: Vulnerability and exposure management metrics.
Definition: <code-text>Deployment Frequency<code-text> shows how often an organization deploys code to production or releases it to end users. It’s another one of the 4 Key DORA metrics. We recommend normalizing for the number of people on a team, because otherwise larger teams will have an advantage since they have more people writing code to deploy.
Calculation:
Why it matters: Deployment Frequency is an indicator of the value we're providing to customers, because it shows how often we’re shipping new features or product improvements to them. For more about this and the other DORA metrics, check out our primer on DORA and SPACE metrics.
Definition: The percentage of code that is executed during automated testing.
Calculation:
Why it matters: Higher <code-text>Test Coverage<code-text> increases confidence in the software's reliability, as more of the codebase has been tested for potential errors. However, test coverage alone does not guarantee quality; tests must be meaningful and cover edge cases effectively. We recommend pairing this quantity metrics with spot-checks for quality – pull up some of the tests and check for whether they’re meaningful, and not just relying on mocks or testing whether a variable is defined.
The biggest thing to watch out for is whether test coverage drops when you make changes to the codebase. The most important thing here is to make sure that as changes are made to the codebase, <code-text>Test Coverage<code-text> goes up.
Definition: The percent of work (measured based on issues or commits) that goes into maintenance and support work rather than new features. In Multitudes, we have an analysis of Feature vs Maintenance work that shows this; you can look at this based either on ticket tracking data or based on main-branch commits (if you use conventional commits).
Calculation:
If you want to calculate this from issue data, it would be:
Or if you want to calculate this based on conventional commits you would use:
Note: If you use conventional commits, we suggest that you include these prefixes as part of the maintenance group: <code-text>build-code<code-text>, <code-text>ci<code-text>, <code-text>chore<code-text>, <code-text>docs<code-text>, <code-text>refactor<code-text>, <code-text>style<code-text>, <code-text>test<code-text>.
We like to have a separate bucket for bug or support work, based on commits with the prefix <code-text>fix<code-text>. And then <code-text>feat<code-text> and <code-text>perf<code-text> are typically feature work. You can see more about how we calculate this metric in our Feature vs Maintenance Work docs.
Why it matters: <code-text>The Rate of Maintenance Work<code-text> shows how much your developers are getting bogged down in maintenance and support work instead of being able to spend time on new features. If other stakeholders are concerned about the pace of feature work getting done, this analysis can be helpful for getting to the root cause of delays. Many organizations aim for 70% feature work, 30% maintenance/support/other work.
Engineering teams use Multitudes to measure and improve their software quality. Multitudes is an engineering insights platform built for sustainable delivery. Multitudes seamlessly integrates with your existing tools like GitHub and Jira to provide a comprehensive view of your team's technical performance, operational health, and collaboration patterns.
With Multitudes, you can:
By leveraging Multitudes, teams can focus less on metrics collection and more on using these insights to drive engineering excellence.
Want to find out more? Check out how Multitudes works, or get a demo today.