The CVSS Trap: Why Vulnerability Severity Scores Break Down In Your Environment

At some point, every security team has had this conversation: the scanner reports a CVSS 9.8 critical vulnerability in a dependency, the engineering team asks how urgent it is, and nobody has a good answer. Half the team wants to drop everything and patch immediately. The other half suspects the finding is lower priority in their specific context but cannot articulate why. Both are right, in different ways — and the confusion stems from a fundamental misunderstanding of what CVSS actually measures.

CVSS — the Common Vulnerability Scoring System — is not a risk score. It was never designed to be. It is an exploitability and impact characterisation of a vulnerability in a generic, attacker-agnostic context. The distinction matters enormously when you are deciding which of 300 scanner findings to remediate this sprint.

What CVSS actually measures — and what it does not

A CVSS v3.1 Base Score is computed from eight metrics: Attack Vector, Attack Complexity, Privileges Required, User Interaction, Scope, Confidentiality Impact, Integrity Impact, and Availability Impact. Every one of these metrics describes the vulnerability in isolation, evaluated against a hypothetical default deployment.

"Attack Vector: Network" means the vulnerability can be exploited over a network — it says nothing about whether your network exposes this component to untrusted callers. "Privileges Required: None" means no authentication is needed to trigger the exploit — it does not account for whether you have a WAF, an API gateway with rate limiting, or a VPN in front of the affected service.

The CVSS specification says so explicitly: "The Base Score represents the intrinsic characteristics of a vulnerability that are constant over time and across user environments." The word "environments" here is key — Base Scores deliberately ignore your environment. They are designed to be universal. That universality is their purpose and their limitation.

CVSS does offer Environmental and Temporal scoring modifiers, but in practice almost no organisation applies them at scale. The result is that most vulnerability management programmes treat Base Scores as absolute risk rankings, which they are not.

Four ways CVSS misleads patch prioritisation

It ignores your network topology

A CVSS 9.8 Remote Code Execution in a library used by your internal analytics pipeline — accessible only from an authenticated internal VPN segment — carries fundamentally different risk than the identical CVE in a public-facing API endpoint. The Base Score is identical in both cases. Your actual exposure is not.

In a system with a well-modelled architecture, these two scenarios would look completely different. The DFD for your internal analytics pipeline would show no external entity with a direct data flow to the affected component. The attack tree would require the attacker to first compromise a VPN credential — itself a non-trivial precondition that reduces the likelihood of exploitation significantly. The public API would have no such upstream gate.

It ignores compensating controls

Compensating controls do not appear anywhere in the CVSS Base Score computation. A CVSS 8.1 SQL injection vulnerability in a service behind a Web Application Firewall with SQL-injection rule sets enabled, with a read-only database user configured, and with parameterised queries already in use everywhere except one edge-case endpoint is not an 8.1 risk in practice. The likelihood of successful exploitation in that context is materially lower than the score implies.

This is not an argument for ignoring vulnerabilities behind compensating controls. It is an argument for making the compensating controls explicit in your risk model, so you can prioritise accurately and document why certain findings are scheduled for later rather than immediate remediation.

It ignores attacker motivation and targeting

CVSS assumes an attacker who is competent, motivated, and specifically targeting your affected component. For a high-profile open-source library with active exploit toolkits in the wild, that assumption is reasonable. For an obscure internal dependency with no known public exploit and no reason for a targeted attacker to know it exists in your stack, the same assumption dramatically overstates the practical threat level.

This is not security-through-obscurity thinking. It is a recognition that attacker effort is finite and prioritised by expected return. Threat modeling forces you to think about who is likely to attack your system and what they are after — which is exactly the question CVSS does not ask.

It conflates technical severity with business impact

"Confidentiality Impact: High" in CVSS terms means the vulnerability can result in total loss of confidentiality for the affected component. It says nothing about whether the data in that component is publicly available already, covered by regulatory requirements, or core to your business's competitive advantage. A leaked debug log of server uptime statistics and a leaked database of customer payment records can both score identically on CVSS Confidentiality Impact.

Contextual risk scoring with attack trees

The alternative to CVSS-as-risk-score is not an abandonment of systematic scoring — it is a richer model that accounts for your specific architecture. Attack trees, as we explored in our introduction to threat modeling, represent the attack paths an adversary must traverse to reach a harmful outcome. Each path is a sequence of preconditions. Each precondition has a likelihood that depends on your specific defences.

When a CVE is disclosed, the question to ask is not "what is the CVSS score?" but "does this vulnerability appear in my attack tree, and if so, at what position in the path?" A vulnerability that appears as a terminal leaf node — directly exploitable for high-impact outcomes with no upstream gates — is genuinely critical in your context. The same vulnerability appearing five levels deep in a tree, reachable only after an attacker has already compromised a privileged account, is lower priority regardless of its Base Score.

Below is a concrete comparison of how contextual scoring changes prioritisation relative to raw CVSS:

Vulnerability	CVSS Base	Context A: Internet-facing, no upstream auth	Context B: Internal-only, behind MFA + VPN
Unauthenticated RCE in web framework	9.8 Critical	Critical — patch this sprint	Medium — patch next cycle
SQL injection in ORM edge case	8.8 High	High — active exploit kit available	Low — WAF + read-only user in place
SSRF in image processing service	7.5 High	Critical — can exfiltrate cloud credentials	Low — no cloud metadata endpoint reachable
Privilege escalation in local process	8.4 High	Medium — requires prior foothold	Medium — requires prior foothold
Cleartext credential logging in debug mode	6.5 Medium	Critical — debug mode enabled in prod	Low — debug mode disabled

Notice that contextual scoring sometimes elevates a finding's priority above its CVSS score (the SSRF case with cloud credentials reachable) and sometimes reduces it. Neither outcome is about minimising security effort — both are about directing that effort accurately.

A worked example: the same CVE, two very different risk scores

Consider a hypothetical CVSS 9.1 Server-Side Request Forgery in a popular HTTP client library. The vendor advisory says the vulnerability allows an unauthenticated attacker to make arbitrary outbound HTTP requests from the server.

Team A uses the library in their customer-facing document conversion service. The service accepts uploaded files from the public internet, processes them using the vulnerable library, and runs on an AWS EC2 instance with access to the instance metadata service (169.254.169.254). An attacker can exploit the SSRF to request the metadata endpoint, retrieve a short-lived AWS IAM credential, and use it to access S3 buckets containing customer data. The attack path is two steps: upload a crafted document, retrieve the metadata credential. No prior access required. This is a genuine 9.1 in context.

Team B uses the same library in an internal batch job that fetches product data from their own internal catalogue API. The job runs on a worker node behind a VPN, the process runs with a non-privileged service account, and the internal network has no route to any metadata endpoint. The attack requires an adversary who has already compromised the internal network — at which point the SSRF is the least of the problems. In context, this is a low-priority finding despite the identical Base Score.

Both teams should patch eventually — but Team A should be in an emergency change window, while Team B can schedule it in the next regular maintenance cycle. CVSS alone cannot tell them that. Their Data Flow Diagrams can.

Making CVSS useful again: input, not output

None of this means CVSS is worthless. It is an excellent first-pass filter. A CVSS 4.0 is extremely unlikely to be critical in any context. A CVSS 9.8 is a strong signal that a vulnerability deserves careful contextual evaluation before it is deprioritised. The mistake is treating it as the final word rather than the starting point.

The right workflow is: use CVSS to identify which findings warrant deeper analysis, then use your threat model to perform that analysis. Ask where the vulnerable component sits in your DFD, what trust boundaries separate it from untrusted callers, what upstream gates exist in the attack tree, and what data or capabilities an attacker would gain from successful exploitation. That analysis produces a contextual risk score that is defensible to both your security team and your compliance auditors.

Practical heuristic: Treat any CVSS ≥ 9.0 as "evaluate immediately." Treat CVSS 7.0–8.9 as "evaluate within one sprint." For CVSS below 7.0, use your threat model to decide whether the component appears on any critical attack path — if not, schedule for next release cycle. This is not risk acceptance; it is risk-informed scheduling.

In the next post, we look at how threat models and the risk registers they produce satisfy the continuous evidence requirements of SOC 2 — a framework that is often reduced to a one-time documentation exercise, with predictably poor results.