The Incident That Closed as Resolved
The outage, the resignation, and the overloaded engineer that everyone is treating as three problems.
The auth incident channel goes quiet at 11:47 PM. Lucas posts “resolved” – a token refresh race condition, patched in two hours, ticket closed. By Thursday morning, Karim’s exit notice lands in a different Slack channel, routed to a different audience, owned by a different conversation. You reviewed Lucas’s patch. You know it’s the third auth fix this quarter. You know Karim was one of two engineers who actually understood why the system behaved the way it did under load.
Nobody else is putting these next to each other.
The incident is in PagerDuty, filed and resolved. Karim’s departure is in an HR thread, a headcount conversation, and a backfill requisition. Lucas’s workload is a staffing problem, maybe a team health check, eventually a manager 1:1. The only place all three exist as a single fact is inside your head. And you already know – from the RFC that circulated without landing, from the architecture review where the decision had been made before you sat down – that the gap between what you can see and what the organization can act on is not a gap in data. It’s a gap in category.
Leadership doesn’t care about the auth system because they’re not looking at a system. They’re looking at three things.
Foundational systems are invisible until they break, and the work that keeps them from breaking is invisible the whole time. An incident is discrete: it has a start time, a resolution, a retro, and a ticket that closes. A departure is HR: a backfill req, a transition plan, a knowledge transfer that everyone agrees should happen, and almost never fully does. An overloaded engineer is staffing: a prioritization conversation, a sprint adjustment, maybe a new hire. Each problem is legible in the category the organization already has. The system producing all three has no category.
You are the only person positioned to name it as a system. The staff engineer’s distinctive advantage – pattern recognition across teams, timescales, and incident types – is exactly what makes this visible to you and not to anyone else in the room. That advantage is also the trap. You have to translate what you see into terms a decision can be made in, and the first translation most engineers reach for is the one the organization has trained itself to deprioritize.
You file it as tech debt.
This is the truest word and the least fundable word, said in the same breath. “Tech debt” is precise to you – a real obligation, deferred, accumulating interest in the form of incidents, cognitive load, and departing engineers. To the organization running a prioritization ledger, it describes a maintenance request with no visible upside competing against a roadmap full of features with visible upside. The framing isn’t wrong. It’s just structurally a loss in the only competition that determines what gets funded.
You make it about a person.
“We’re losing Karim, and Lucas is drowning.” Both true. Both important. Both convert a systemic argument into a retention problem – which earns a sympathetic nod, a backfill req, and a new engineer who will be in the same position in eight months because the system that burned Karim out is still the system. The human frame is the most natural one because humans are visible, and the system is not. But the moment you put Karim at the center of the argument, the conversation that follows is about Karim, not the thing that broke him.
You wait for the next incident to make the case for you.
The logic is defensible: let the evidence become undeniable, then no one can argue with it. What actually happens is that the next failure arrives as a customer escalation – a payment timeout during peak traffic, an auth loop that locked out three enterprise accounts for forty minutes. And at that point, the conversation has already shifted. The question is no longer whether we should invest in this system – it’s why this wasn’t flagged. Who owns this? What’s the RCA? The moment of accountability is not the same moment as the moment of investment, and trying to make them the same moment is a framing error that makes you look like someone covering ground after the fact.
You go louder in the same register.
More dashboards. Higher severity language. A longer document with more incident data, more latency charts, more annotations. Volume on a pre-rejected frame doesn’t change the frame. It makes you the person who “keeps raising tech debt” – which is not a characterization that opens budget conversations. It’s one that ends them.
These aren’t failures of argument. They’re failures of category.
Engineers work where the foundation is self-evidently necessary, so “tech debt” feels like a description – it is plain description. The org runs a different ledger. Every spend competes against visible upside: features, revenue, roadmap items with owners and timelines, and stakeholder names attached. A maintenance argument is a request to spend against no visible upside, which is structurally a loss in that competition before anyone reads the brief.
The mistake isn’t caring about the system. It’s assuming your value system is the shared one.
Translation isn’t spin. It’s stating the same fact in the terms the decision actually gets made in. Not “auth is under-maintained” – but “login failure rates are up 40% this quarter, and one of the two engineers who understand the system just resigned, which means the next incident gets handled by someone without context.” That sentence exists in the same category as product risk. It has a metric, a resource event, and a consequence. A VP can carry that to a planning conversation. She cannot carry “we have tech debt in auth” anywhere useful, because it arrives in a category her organization has already learned to defer.
The first sentence describes a system state you understand. The second sentence names a risk the organization can price.
There is a specific reason this argument has to be written.
Speaking lets you gesture. “We should really invest in auth – it’s getting scary.” The gesture communicates anxiety; it doesn’t make an argument. Writing forces the translation to be complete: the incident trend, the customer impact, the explicit trade-off (what gets cut to fund this), and the ownership model on the other side. You cannot hand-wave a sentence you have to type. The act of composition finds the gaps in the case, because the gaps are the sentences you don’t yet know how to write.
The artifact also travels without you. Your manager cannot carry a hallway worry to her VP. She can carry your document – in your framing, with your incident data, in the terms you chose – to a decision-maker you will never be in the room with. The quality of the translation becomes load-bearing in a way it never is in a conversation that ends when you leave. If the case is vague, it arrives vague. If it says “product risk” and means it, it competes as product risk.
Writing also forces the question of whose voice the case is in. Making the argument in Lucas’s language – not just about him but as someone who understands the operational reality he’s living – only happens when you compose it on purpose. That’s not a soft detail. That’s the difference between an argument that sounds like an engineer asking for resources and an argument that sounds like a system asking to be seen.
The auth near-miss. The resignation. The engineer covering two people’s context load on one person’s capacity. You can see what they are together. The next question – the one that’s harder than pattern recognition – is whether you can write the case that turns “tech debt” into product risk.
That means partnering with the PM whose incident data you need without burning her constraints. It means making the argument in Lucas’s voice as much as your own. It means naming the trade-offs clearly enough that Jennifer can carry them upward to a room you’re not in. It means choosing terms that are technically honest and organizationally fundable at the same time, which is not the same sentence twice with different words – it’s a translation that has to be earned, line by line.
There’s no template for it. There’s just a compose window, the incident data, and the work of finding the sentence that makes the invisible thing visible to the people who can do something about it.
This is the scenario.
