Debugging When You Have No Idea What's Going On

Dev Leader Weekly 145

Jun 22, 2026

TL; DR:

Not knowing everything is completely normal
Prove or disprove every hypothesis, then write it down
Disproving is just as valuable as proving
Join me for the live stream (or watch the recording) on Monday, June 22 at 7:00 PM Pacific!

Debugging When You Have No Idea What’s Going On

Something is on fire. A live service is doing something it absolutely should not be doing, the metrics look wrong, and you’re sitting there staring at it thinking: I genuinely have no idea what’s going on. If you’ve ever felt that creeping sense of shame in that moment -- like a “real” engineer would already know the answer -- I want to talk to you. Because that feeling is incredibly common, and it’s almost entirely self-inflicted.

You can check out my full thoughts on this in the video below:

The Shame of Not Knowing Is Almost Entirely Self-Inflicted

This whole topic came out of a post on the ExperiencedDevs subreddit. Someone framed it really honestly: it feels shameful as a software engineer when something is happening that you don’t understand, and you genuinely don’t know what’s going on or how to make progress on it. And I get it. I really do. But I want to say this as clearly as I can: that is completely okay, and it’s completely normal.

No one in the world expects you to know everything. They shouldn’t expect it, and honestly, they can’t expect it -- it’s not possible. Part of being an engineer isn’t having every answer in your back pocket. It’s being curious and working toward a better understanding when you don’t have the answer yet. A lot of this pressure is self-inflicted, and a lot of it is just software engineering culture. We tend to be a group of strong problem-solvers, so hitting a wall where we just don’t know is wildly uncomfortable. That discomfort is normal. It does not make you a bad engineer.

This is also one of the big reasons we build software in teams. If everyone on your team thought exactly the way you do, then the moment you got stuck, the entire team would be stuck right alongside you. Diversity of skill, experience, and perspective is a feature, not a nicety. A lot of the things I wish I understood earlier in my career come right back to exactly this.

Even Multiple Levels Up, We Don’t Always Have the Answer

I was recently talking with one of my engineers who was deep in a live site investigation. For context, the service area my team is responsible for in Microsoft 365 is... a bit outrageous. The sheer amount of traffic flowing through our systems makes it genuinely impossible for any single person to know every detail about everything. Should people on the team get better and better at investigating and understanding it over time? Absolutely -- that’s the whole game. But no single person knows everything all the time. The surface area is just too big.

So there we were, talking it through. A very sharp engineer, and me, several levels up -- and between the two of us, there was no obvious answer. In that moment, as a manager, part of me felt that same pull: man, I wish I just had the answer in my head. But my boss doesn’t expect me to know everything. My skip level doesn’t expect me to know everything. We have to make space for that. If your goal is to never be in a situation where you don’t know the next step, you’re going to fail at that goal. You will end up there. And that’s fine, because what actually matters is that you try to make progress.

If you’ve ever been on the receiving end of someone expecting you to instantly understand everything, this rant about “junior developers who just don’t get it” digs into that expectations mismatch between experience levels.

My Framework: If You Can Prove It or Disprove It, Write It Down

Okay, so with all of that out of the way -- how do I actually approach a problem when I have no idea what’s going on?

Here’s the core of it: most people treat problem-solving as proving a hypothesis right. I treat it as building a matrix of what’s known fact versus what’s still assumption.

When most people debug, they form a hypothesis, go chase it, and if it turns out to be right, great -- they keep progressing. But if it turns out to be wrong, it gets quietly dismissed as wasted effort. “We thought it might be this, we looked into it, it wasn’t, so we got nowhere.” I want to push back on that hard. Disproving something is just as valuable as proving it.

So whenever I can prove a statement true or false -- by reproducing something, by reading through the code, by finding a log that contradicts the theory -- I write it down. On a whiteboard, in a shared doc, wherever the team can see it. Back when we worked more in person, I used to fill entire whiteboards with assumptions, each one marked as proven or disproven. The magic of this is that none of the investigation is wasted. Every hypothesis you resolve, in either direction, shrinks the unknown. While something stays an assumption, it’s a live question hanging over the whole investigation. The goal is to keep converting assumptions into facts.

For the C# side of actually getting useful signal out of failures, I’ve got a guide on handling exceptions in a way that streamlines debugging. For everything else: Build out your matrix of assumptions. Validate them.

There Are No Stupid Hypotheses

One of my favorite side effects of this approach: there are no dumb ideas. When nobody knows what’s going on, someone will eventually say, “This is going to sound kind of crazy, but... could it be something with the operating system?” And the room groans -- “come on, the OS is the last thing that’s going to have a problem.”

But... maybe it is? If you can’t state as a matter of fact that it isn’t, then it’s an assumption, and it’s completely valid to go prove or disprove. Across thousands of machines, could there be something off with a specific OS version in a specific scenario? One hundred percent. Is it common? Nope. Is it possible? Absolutely. So write it down and go check. Often the “crazy” hypothesis is exactly the one worth checking, because it’s the last thing anyone would have looked at -- and you already admitted you don’t know what’s going on.

Watch the Correlations Emerge

Here’s where it gets genuinely satisfying. Once you start turning assumptions into proven facts, patterns start to surface that you couldn’t see before.

You check the OS angle: “Nope, we see it across versions X and Y, so it’s not the operating system.” Fine, that’s a fact now. What about hardware? “We see it on configurations A and B, but not C... so C is still open, but A and B definitely reproduce it.” Now you’re not holding a pile of vague guesses in your head -- you’re drawing a map. And when you lay those proven facts side by side, you might suddenly notice: wait, we only ever see this on these specific combinations. Only this software version. Only this region. Only this traffic pattern.

That’s the moment the investigation stops being vibes and starts being data-driven. You go from “I don’t know, so let’s just guess” into actual experimentation, and then into higher-level correlations between signals you’ve proven are real. For live systems, this is also where good observability and telemetry earns its keep -- the more real signal you’ve instrumented, the faster those correlations show up.

And a bonus point here: get comfortable with AI and MCP servers. Being able to rely on an LLM to pull data and help you correlate it is SIGNIFICANTLY more effective than you being in the weeds just trying to pull the actual data. Let it do the boring work so you can use your brain for what’s more valuable.

Where This Came From: Debugging With Almost No Information

This framework actually goes back to my digital forensics days. The software we built essentially scanned devices -- mostly hard drives at the time -- and turned the data into a structured report so an investigator looking for digital evidence had something readable to work through. Think of it as a giant search engine plus an interactive dashboard.

Here’s the brutal part for debugging: because this was forensics, the data on those devices was highly sensitive material tied to active criminal cases, and a lot of it legally could not be copied or reproduced. So the usual “just send me a repro” was frequently off the table. On top of that, many of these workstations were often air-gapped by design. So when something broke, you might get a terse “it doesn’t work” and maybe a log file -- and you’d better hope there was something obvious in the stack trace, because a lot of the time there wasn’t.

When you’re debugging with limited information and no ability to reproduce the problem, you cannot afford to waste effort chasing hypotheses you’ve already implicitly ruled out. That constraint is exactly what forged this prove-or-disprove habit. It’s the same energy as untangling a gnarly dependency injection debugging nightmare -- you methodically eliminate possibilities until the real cause has nowhere left to hide.

And yes -- afterward, always reflect: if this happened again, could we add better logging, telemetry, or monitoring to catch it faster next time? That follow-up is worth doing every single time. It just usually doesn’t help you in the heat of the moment, so treat it as a separate step.

The Keys Are in Your Hand

One last thing. Sometimes the reason you’re stuck is an assumption you never bothered to prove because it seemed too obvious to check. “Why would I look at that? It’s obviously not the problem.” But did you actually prove it?

It’s like tearing the house apart looking for your keys while they’re in your hand the whole time. I did this constantly as a kid building LEGO -- convinced they forgot to include the one piece I needed, when it was the next piece in the instructions, sitting right in my hand. Investigations have the exact same trap. The thing you “know” is fine is sometimes the very thing that’s broken.

You’re Not a Bad Engineer for Not Knowing

So if you take one thing from this issue, let it be this: you are not a bad engineer because you don’t know the next step. Don’t let the shame paralyze you. Accept that not knowing everything is completely normal -- it has to be, because no one can possibly know everything. The point isn’t to have the answer instantly. The point is to get curious, start writing down hypotheses, and start proving or disproving them one at a time. If you stay curious and keep exploring the problem space, you’ll keep making progress. And that’s all any of us can really ask for.

If you’ve got questions about software engineering or career development, drop them in the comments or head over to CodeCommute.com and write in anonymously. I’m always happy to make a response and share a perspective if it helps.

Dev Leader Weekly

Discussion about this post

Ready for more?