Bad scores cost real money

Production defects start in requirements — and in a vehicle, they end in recalls. An automotive manufacturer ran 400,000 Jira tickets through Vindex: the lowest-scoring requirements traced to $1.2 billion in software-related recall costs, the highest-scoring to zero. The risk was visible, and stoppable, the day each requirement was written.

Vindex Knowledge Center · 7 min read · May 2026

Case study
Automotive
Software recalls
Requirements quality
Quality gate

Vindex quality viewLive score

Scope

Estimate

Testable

When a manufacturer connected story health scores to recall outcomes, the lowest-scoring requirements carried the entire software-related recall bill.

When a requirement looks ready but isn't, someone pays for it later

In a software-defined vehicle, a vague requirement does not stay vague. It becomes an ambiguous control function, a missed edge case, or an untested fault path. Months later it can surface as a field failure, a warranty claim, or a software-related recall. By then the cost is no longer a sprint delay. It is measured in vehicles, regulators, and reputation.

A global automotive manufacturer wanted to know whether that link was real and measurable: do weak requirements actually predict expensive failures, or do strong teams simply absorb the noise? They ran a pilot with Vindex to find out.

400K

Jira tickets scored for story health in the pilot.

$1.2B

Software-related recall cost traced to the lowest-scoring 40,000.

Software-related recall cost traced to the highest-scoring 40,000.

The pilot: 400,000 tickets, scored before judgment

The manufacturer fed 400,000 historical Jira tickets through Vindex. Each work item was scored for story health: whether the scope was clear, the acceptance criteria were present, the work was sized to estimate, and the outcome was testable. No outcome data was used to influence the score. Vindex read the requirements the way it would on the day they were written.

The team then isolated the two ends of the distribution: the 40,000 highest-scoring tickets and the 40,000 lowest-scoring tickets. With the cohorts fixed, they connected each ticket to its downstream record and totaled the direct software-related recall costs attributable to each group.

The lowest-scoring 10% of requirements carried the entire software-related recall bill. The highest-scoring 10% carried none of it.

Automotive manufacturer pilot, 400,000 scored tickets

The result: the score predicted the cost

The two cohorts were the same size and came from the same backlog, the same teams, and the same tools. The only difference Vindex saw was the quality of the requirement itself. The cost difference between them was not subtle.

At risk · Bottom 40,000 tickets

$1.2B

direct software-related recall cost

The lowest-scoring requirements: vague scope, missing acceptance criteria, and outcomes no one could test. This cohort traced to $1.2 billion in direct software-related recall costs.

Healthy · Top 40,000 tickets

direct software-related recall cost

The highest-scoring requirements: clear scope, explicit acceptance criteria, and testable outcomes. This cohort traced to zero dollars in direct software-related recall costs.

Zero against $1.2 billion is not a rounding difference. It is the difference between requirements that were ready and requirements that only looked ready — work that carried hidden risk straight into the vehicle.

Why the low scores were so expensive

The lowest-scoring cohort was not a list of obvious mistakes. These were requirements that passed review and entered sprints like any other. What they shared was the absence of the signals that make a requirement safe to build: a scope a reader could bound, criteria a tester could check, and an outcome someone could verify before it shipped.

In a consumer app, that ambiguity costs a rework cycle. In a braking module, a battery management system, or an over-the-air update, the same ambiguity becomes a fault that has to be found in the field — and fixed across an entire fleet.

What this proves

Story health is often treated as a tidiness metric — nice to have, easy to skip under deadline. This pilot reframes it. The score is a leading indicator of cost. The requirements Vindex flagged as at risk were the same requirements that, years later, drove the recall bill.

The takeaway is not that one cohort was unlucky. It is that the risk was visible at the moment each requirement was written, long before it reached a sprint, a vehicle, or a regulator. Vindex surfaces that signal early enough to act on it.

Continue reading

Field guide

Productivity in the Age of AI

A quality-first operating model for using AI to reveal where requirements, code, and defect quality are leaking productivity.

Estimate what weak requirements cost you

Use the savings calculator to put a number on the rework and risk hiding in your own backlog.

Find the expensive requirements in your backlog before they ship

See how Vindex scores story health and flags the vague, oversized, and untestable requirements that carry the most downstream risk.

View the interactive demo