Americas

  • United States

Asia

Oceania

roger_grimes
Columnist

To solve the unsolvable problem

Analysis
Mar 14, 20084 mins
Data and Information SecuritySecurity

At least once a week someone comes to me with an unexplainable, random problem that they begin to think might be malware-related. Some of the scenarios are almost laughable. Here's one I heard this week: "We upgraded the file servers for a particular application last week, and now we are having random printing problems. Do you think it might be a computer virus?" They seemed surprised when I tell them I don't kn

At least once a week someone comes to me with an unexplainable, random problem that they begin to think might be malware-related. Some of the scenarios are almost laughable. Here’s one I heard this week: “We upgraded the file servers for a particular application last week, and now we are having random printing problems. Do you think it might be a computer virus?”

They seemed surprised when I tell them I don’t know of a malware program that causes random printing problems on upgraded server applications. What are they thinking? I guess security people are pretty good troubleshooters to ask during an “unexplainable, random problem” scenario. Security people usually have a strong understanding of host and network mechanics and years of experience. And unexplainable, random problems are some of the hardest in the computer world to troubleshoot.

So when I have a client or friend faced with a “random, unexplainable problem,” here’s what I tell them:

First, there is nothing random in the computer world. Ask any crypto programmer. They spend their lives trying to create realistic randomness but know it doesn’t truly exist in the computer world. They can get to very good approximations of randomness, but true randomness does not exist. Computers can’t do random. They are full of ones and zeros, positive and negative charges, and logic gates. They only do what they are told. It is always cause and effect. If it appears random, then you need to find out what specific set of conditions has to be true for the problem to manifest.

Next, when faced with the “unexplainable problem,” the best thing you can do is to figure out what the problem isn’t. You can do this by testing various scenarios that will either rule in or out a particular cause or symptom. You want to try things that separate problems into one type of problem versus another.

The idea is that you want to test scenarios that make big distinctions. It’s like asking someone to guess a number between 1 and 100 using the smallest set of guesses possible. The first guess should be something like, “Is it above 50?” or “Is it below 50?” The idea is to rule out or in the biggest set of possibilities first. If the number holder said yes to the first question, the second question would be, “Is it above 75?” and so on. Do the same thing with your unexplainable problem.

For example, with a printing problem, here are some possible questions: Does it only happen to certain people or computers? Does it only happen to one application on the computer or all applications? Does it only happen to particular printers? Does it only happen during particular times of the day? Does it happen if the person prints locally or only over the network? Does it happen if you switch out printer models?

Once you’ve narrowed down the larger problem, start to test smaller and smaller operational scenarios. When faced with the unexplainable problem, you want to continue to rule in our out particular symptoms until you narrow down the exact problem. Once you have identified the exact problem, the solution is usually only minutes away.

Of course, anyone with any computer troubleshooting experience will always tell you to test what changed last (if that is possible). And if the end-user complaining says nothing changed recently, it’s good to be skeptical. It’s amazing how many end-users claiming “I didn’t change anything” changed something major when their memory is refreshed a little.

I have some other hints: Troubleshoot along the OSI model. Don’t forget to check physical connections. You’d be surprised how many unexplainable problems turn out to be cables that just went bad at the same time as a system got upgraded or how that little crimp in the network cable ends up causing sporadic problems, or only causes timeouts under heavier traffic loads.

I’m also a big fan of network sniffing. Download Wireshark and sniff a traffic session from something that is working correctly and the problematic workstation, and then troubleshoot the differences. Look for handshakes, re-transmits, and timeouts.

In the end the random, unexplainable problem is normally just a simple setting or misconfiguration mistake. And of course, it can’t hurt to do a malware scan if just a particular workstation is involved. I don’t normally suspect malware right away in most normal troubleshooting scenarios, but you never know …

roger_grimes
Columnist

Roger A. Grimes is a contributing editor. Roger holds more than 40 computer certifications and has authored ten books on computer security. He has been fighting malware and malicious hackers since 1987, beginning with disassembling early DOS viruses. He specializes in protecting host computers from hackers and malware, and consults to companies from the Fortune 100 to small businesses. A frequent industry speaker and educator, Roger currently works for KnowBe4 as the Data-Driven Defense Evangelist and is the author of Cryptography Apocalypse.

More from this author