And why theoretical security is theoretically impossible, and how to tell good programs from bad ones

Trojan horses. Keyboard loggers. Viruses. Bad insiders. Bad outsiders. Evil-doers. Perforated firewalls. Corrupt backups. Spam. A few years ago, many security professionals I knew looked forward to the day when the majority of the world’s computer security problems were worked out. Back then, we thought that improving security was just a question of deploying technology, providing training and getting people to follow the appropriate procedure. But a look at computer science theory proves otherwise.

A fundamental goal of computer security has been to put some sort of restrictions on program execution. For example, worms and viruses wouldn’t be a problem if they didn’t damage our files, reformat our hard drives and e-mail themselves to everybody in our address books. If there were just some way we could stop the bad programs from doing what we don’t want, without affecting the execution of the good programs, our problem would be solved.

But to stop the bad programs, we’re going to need some way of distinguishing the bad from the good. That is, we’re going to need a pro-gram-analyzing program that can look at any given suspect program and determine if it has any hostile code. If it doesn’t, then the suspect application is safe to run. Security is simple.

But, as it turns out, writing such a program-analyzing program is theoretically impossible.

The impossibility stems not from legal issues, such as determining whether a programmer had “criminal intent”, but from a technical conundrum. The mathematics of computing make it impossible to write software that can figure out what other programs can do, prior to execution.

To be sure, it is possible to write programs that can examine simple programs and determine what they do. But there’s the problem: In principle, there is no way to examine a program’s instructions and figure out precisely what it does. That’s because it is too complicated. In principle, running the software is the only way to discover what it does. And once a hostile program is run, it’s too late: The damage has been done.

That’s why today’s antivirus systems don’t try to predict what an infected program will do. Instead, these systems have a list of “signatures,” or sequences of bytes, that have been observed in existing viruses. If the copy of Microsoft Word that’s about to be run contains the same copy of bytes that were discovered years ago in the Michelangelo virus, for example, then that copy of Word is deemed infected and is not permitted to run.

But let’s say that your copy of Word is infected by a virus that’s never been seen before; then you’re out of luck. You can analyze the copy of Word to see if it has instructions for formatting the computer’s hard drive, for example. But if you can’t find them, that still doesn’t prove that the program won’t format your hard drive; computer programs can modify themselves when they run and add these commands to themselves. It’s this ability of computer programs to modify themselves that makes their behavior mathematically impossible to predict.

British mathematician Alan Turing proved the impossibility of predicting program actions back in 1936. Turing looked at the simplest kind of hostile program and proved that it is impossible to tell in advance if a program will go into an infinite loop, or stop. Today we call this the Halting Problem. It is unsolvable.

The more you think about the Halting Problem, the stranger it seems. That’s because every program will eventually halt or run forever. And since computers are deterministic machines that do exactly as they are programmed, it seems that it should be possible to analyze a program and determine in advance what that program will do when it is run. But Turing showed that you can’t

It’s trivial to extend Turing’s work to any other kind of computer security problem involving hostile code: You can’t examine a copy of Word and determine if a back door has been placed in the program to surreptitiously e-mail copies of your confidential documents to Argentina. And you can’t prove that your copy of Windows isn’t silently recording every password that you type for playback at a later time.

One favored approach lately to “solving” the computer security problem of desktop computers is to gimmick their operating systems so that they will run only the programs that have been digitally signed by publishers such as Microsoft and Adobe. But Turing’s work tells us that these digitally signed programs may nevertheless have exploitable security bugs: It’s theoretically impossible to prove that they don’t.

Just about the only way to take back computer security from the morass that Turing created is to restrict what computer programs can do

But it takes surprisingly little power and flexibility to tip the scales and make a program’s behavior unpredictable. JavaScript’s ability to manipulate the contents of a webpage after they’re loaded

Turing’s theorem isn’t the only quandary that theoretical computer scientists have thrown at security professionals. Another one is the fundamental question of whether computers can solve problems that are truly complex (that is, complex in a theoretical sense) by short-circuiting the underlying mathematics that make these problems hard. Mathematicians have worked on this problem for more than 30 years, but so far nobody knows the answer.

These hard problems are all in a mathematical family called NP, and they share a common peculiar property: The problems are hard to solve, but once you find an answer it’s trivial to prove that it is the correct one. Code-breaking is one of those problems. If the only way you have to crack an encrypted message is by trying every key, it might take a hundred billion years for you to try them all and find the right one. But once you have the right key, it takes less than a fraction of a second to prove that the key is the correct key: All you do is decrypt the message.

There are many of these NP problems: scheduling speakers and rooms at a conference, for example, or seating women at a dinner party so that nobody is sitting next to an ex-boyfriend. For every case, the only way we know to solve these problems is through brute-force search

The strange thing about these NP problems is that mathematicians have never formally proven that brute-force search is the only attack that works: They just haven’t found any other way to solve these problems. But in 1973 two mathematicians, Stephen Cook and Leonid Levin, made a simultaneous discovery: If a fast way is ever found to solve a particular kind of NP problem, a problem that’s so-called NP complete, then that result could be used to solve every NP problem.

Some mathematicians claim that such a result might be the end of cryptography. In fact, it might not. The difficulty of factoring large numbers is the basis of the RSA encryption algorithm. For years, factoring was thought to be an NP problem, but two years ago some clever mathematicians found a way to short-circuit the math. But as it turns out, the short-circuit technique isn’t very fast, and the security of RSA is safe

If mathematicians ever find a way to solve an NP-complete problem, and it’s possible that somebody might, then this knowledge could be used to reverse-engineer practically every encryption scheme that’s ever been devised. Whether those attacks would be practical, no one knows. But never ignore the possibility that all of the science of cryptography could suddenly come tumbling down. Is such a result likely? Of course not. But it is possible.

So the next time you feel overwhelmed by the responsibilities that come with being a security officer or consultant, take solace in the fact that the problems you are trying to solve are fundamentally unsolvable. Given those odds, any progress you can make is probably worthwhile.