Algorithms and Structural Complexity Theory continued

Preview of Things to Come

We will (try to) see the following:

Some problems in NP are harder than others because we can use the hard problems to solve the easier problems.
It turns out some problems are as hard as all of the other problems in NP, i.e., a solution to one of those problems leads to a solution to any other problem in NP. These are the NP-complete problems.
We'll see some examples of NP-complete problems, like Travelling Salesman.
We'll see implications of the existence of NP-complete problems. For example, if there is a polynomial time solution to the Travelling Salesman problem (or any other NP-complete problem), then P=NP. On the other hand, if we can prove there is no such solution, then P!=NP.
If P!=NP, then there are some "NP-incomplete" problems that are neither NP-complete nor polynomial time solvable.
P and NP are part of a larger class called PSPACE, the problems that can be solved using polynomial space (but as much time as you want).
Another class, also in PSPACE, is co-NP. These are the complements of all the problems (sets) in NP. For example, since COMPOSITE-NUMBER is in NP, PRIMALITY (deciding primality) is in co-NP.
No one has proven whether NP=co-NP.
It turns out that co-P, the complements of problems solvable in polynomial time, is equal to P, i.e., P is closed under complementation.
That means that, if we can show NP != co-NP, then we know P != NP because otherwise NP would be closed under complementation just like P.
An alternate definition of NP, using something called a nondeterministic machine, is something you might see in another class or in a book. The definitions given in the last lecture and the nondeterministic defintion are equivalent.

Polynomial Reducibility

We say a decision problem D' is polynomial time reducible to a decision problem D if there exists a polynomial time computable function f(x) from instances of D' to instances of D such that x is in D' if f(x) is in D. We write "D' <_p D" to mean D' is polynomial time reducible to D.

So, if D' <_p D, and we can solve D in time t (with some algorithm), then we can solve D' in time polynomial in t. Let's look at an example:

Problem: HAMILTONIAN-CYCLE
Instance: A graph G = (V, E)
Question: Does G contain a Hamiltonian cycle? That is, is there a path (cycle) going from one vertex of G, through all the other vertices of G exactly once, ending up at the same vertex?

First of all, is this problem in NP? Yes; a certificate for it would be the list of vertices, in order, that make up the cycle. We can check this easily in time polynomial in the size of the graph.

Now recall the problem TSP, as presented in the last lecture. We can use TSP to solve HAMILTONIAN-CYCLE; all we do is let k (the length of the tour we are looking for) be arbitrarily large, weight the edges of G with 1, and see if there is a TSP tour (of any length) through the graph. If there isn't (i.e., there was just no way to get from one vertex to another without going through a third twice, or the graph was not connected), then there is no Hamiltonian cycle. If there is a TSP tour, then there is a Hamiltonian cycle. All of this conversion of the HAMILTONIAN-CYCLE instance into the TSP instance can be done in linear, thus polynomial, time. So we say HAMILTONIAN-CYCLE <_p TSP. TSP is "harder" than HAMILTONIAN-CYCLE. Note that the certification of an instance of HAMILTONIAN-CYCLE can also be converted from a certificate for TSP in polynomial time. Note also that, if TSP answers "no," there is no certificate (thus no proof) that the instance isn't in HAMILTONIAN-CYCLE.

It should be noted that two problems can be the same hardness, i.e., D <_p D' and D' <_p D can both be true at the same time. Also, two problems may not be related at all, so that neither D <_p D' nor D' <_p D might be true (so <_p is a partial order). For convenience and added confusion, I write "harder" when I should write "at least as hard as."

Some problems are harder than all of the problems in NP. One example is the problem HALTING we saw last lecture; it asks whether a C program will ever reach exit(). All we have to do is code up a C program that decides whether our instance is in D, calling exit() if yes, going into an infinite loop if no. This program can be mechanically constructed in polynomial time, then all we have to do is solve HALTING (good luck with that part). So HALTING <_p D for all problems in NP.

Definition: A problem D is called NP-hard if, for all problems D' in NP, D' <_p D.

So an NP-hard problem is something harder than anything in NP.

Definition: A problem D is called NP-complete if

It is NP-hard.
It is in NP.

So an NP-complete problem would in some sense be the hardest problem in NP. Our definition of <_p allows problems to be equally hard, so we could have many equally hard NP-complete problems that are all harder than all of the other (alledgedly easier) problems in NP.

It follows that, if D' <_p D and D' is an NP-complete problem, and D is in NP, then D must also be NP-complete. So, if we can show just one problem NP-complete, we have a tool to find more NP-complete problems.

Can we show just one NP-complete problem? Yes. Consider the following decision problem:

Problem: CIRCUIT-SATISFIABILITY
Instance: An acylcic (i.e., no cycles), directed graph G whose nodes are logic functions: AND, OR, or NOT, or logical variables. The graph represents a combinatorial logic circuit with n inputs and 1 output.
Question: Is there any assignment to the n input variables that will cause the output to become True?

This problem was shown, in the early 1970s, to be NP-complete in a proof by Cook that became known as Cook's Theorem. It is somewhat involved, but the basic idea is this: any polynomial time certificate system can be transformed into a polynomial sized logic circuit (that's sort of what a computer does, anyway). The circuit encodes the certificate verification algorithm run on a particular instance. The inputs to the circuit are the certificate, and the output is True if the certificate verifies the instance, False otherwise. If there is an assignment that causes the circuit to output True (a satisfying assignment), then this assignment is a certificate verifying the instance. Since every problem in NP has a polynomial time proof system, this technique works for all of them, so CIRCUIT-SATISFIABILITY can be used to solve any problem in NP and is thus NP-complete.

From this foundation, we can build up a library of NP-complete problems that can be used to solved CIRCUIT-SATISFIABILITY or other previously proven NP-complete problems. It turns out that there are thousands of them.

Some fun facts about NP-complete problems:

No one has ever come up with an efficient (i.e., polynomial time) algorithm for any of them.
If anyone ever does, then that solution can be used to solve all of them in polynomial time through polynomial reducibility (and solve all the other problems in NP as well, like COMPOSITE-NUMBER), and then P would be the same as NP.
If anyone can ever prove that just one NP-complete problem can't be solved in polynomial time, then immediately all NP-complete problems are immune from polynomial time solutions and P != NP.
No one has ever been able to show that any non-trivial problem in NP isn't NP-complete; to do so would show that some problems aren't solvable in polynomial time with respect to some others, and that would mean some problems are in P but not in NP, so P!=NP.
Even if P=NP, there's no guarantee that solutions to the TSP might not take, say, O(n¹⁰⁰).
There are good algorithms that can solve many instances of NP-complete problems in polynomial time. However, for some instances (and it usually turns out that these are the important instances), the problem still blows up to exponential time.
Some problems, like GRAPH-ISOMORPHISM, are conjectured to be in a class called NP-incomplete, problems that aren't NP-hard but that aren't in P, either. If P!=NP, then its been shown that these problems must exist; if P=NP, then these problems can't exist.
Lots of research has focused on relativized worlds, i.e., worlds where algorithms are allowed to consult an oracle capable of solving problems for them. A TSP oracle, for example, would be able to solve an instance of the TSP problem in constant time. A P oracle would be able to solve any problem in P in constant time. It has been shown that, relative to a "random" oracle (with well defined meaning of the term random oracle beyond the scope of this class), that P != NP with probability 1. However, there are relativized worlds where P = NP, and recently other classes of problems have turned out to be equal to each other when they weren't equal relative to a random oracle with probablity 1 (namely. PSPACE and IP, IP being problems with "interactive proofs").
It has been said that most computer scientists now believe that P != NP, although it may be closer to the truth to say that most conjecture that P != NP but secretly hope that P = NP (since that would make life a lot more interesting!)
Kurt Gödel, the famous mathematician, believed that P = NP (before anyone called them P and NP).
What would really be bad would be if someone proved that P = NP is undecideable, i.e., that there exists no proof either way. That would mean that maybe there is a polynomial time algorithm for an NP-complete problem (i.e., P = NP), but no one could ever prove it. Some things in math have turned out to be undecidable (like the continuum hypothesis, i.e., that there is nothing more infinite than the integers but less infinite than the real numbers).