Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic...

17
Randomized Algorithms                                                        Lecture  :Kavitha Telikepalli Topic :(2-SAT & MAX-3-SAT)      Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120)               Suita Sharma(2007MCS2927) Date : 14/02/2008 Agenda 1. Introduction 2. 2-SAT Problem 3. Monte Carlo vs. Las Vegas methods 4. Analysis of 2-SAT 5. Random Walks and Markov inequality 6. 3-SAT Problem 7. Why 3-SAT is NP-Hard? 8. Max 3-SAT Problem randomization & de-randomization 9. References

Transcript of Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic...

Page 1: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Randomized Algorithms Lecture :Kavitha TelikepalliTopic :(2­SAT & MAX­3­SAT) Naveen GargScribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Date : 14/02/2008

Agenda

1. Introduction

2. 2­SAT Problem

3. Monte Carlo vs. Las Vegas methods

4. Analysis of 2­SAT

5. Random Walks and Markov inequality

6. 3­SAT Problem

7. Why 3­SAT is NP­Hard?

8. Max 3­SAT Problem randomization & de­randomization

9. References

Page 2: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Introduction

Randomized Algorithm

In addition to input, algorithm takes a source of random numbers and makes random choices during execution. Behavior can vary even on a fixed input.

SAT

Satisfiability (SAT) is the problem of deciding whether a boolean formula in propositional logic has an assignment that evaluates to true. SAT occurs as a problem and is a tool in applications (e.g. Artificial Intelligence and circuit design) and it is considered a fundamental problem in theory, since many problems can be naturally reduced to it and it is the 'mother' of NP­complete problems.

2­SAT Problem

= (x1 V~x2) ^ (x1 V ~x3) ^ (~x1 V x3) ^ φ (~x1 V ~x2)

The k­SAT problem is the variant of SAT, in which, each clause consists of exactly k distinct literals. For k >= 3, k­SAT is NP hard, but for k = 1 and 2, there are polynomial time solutions. For k = 1, this solution is trivial. But for k = 2, it is slightly tricky. There is a much easier random algorithm.

Page 3: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Example : AND of clauses

(A1 V A2 V A3 V A4) (ĀΛ 1 V Ā3 V Ā4) (AΛ 1 V Ā2 V Ā3 )

Each clause has at least one literal set to true. If Ā1= true, A2 = true then Clause 1 and 2

are satisfiable but the Clause 3 requires A3 literal set to be false for it to be a satisfiable one.

For a system with literals Xi, 0 <= i < m, and clauses Cj, 0 <= j < n, it goes as follows:

Algorithm :Input : A Boolean formula φ in conjunctive normal form with exactly two distinct literals in every clause. E.g., = (x1 V~x2) ^ (x1 V ~x3) ^ (~x1 V x3) ^ (~x1 V ~x2)φ

1)Start with an arbitrary initial assignment to the literals2)The function number_satisfied() returns the number of satisfied clauses for the current assignment 1) for (i = 0; i < m; i++) xi = true; // Let us call this assignment A

1. for (t = 0; t < T && number_satisfied(x, n) < n; t++)

select an arbitrary non­satisfied clause Cj;

randomly and uniformly pick one of the literals xi in Cj;

xi = (xi + 1) mod 2; //toggle the truth value of xi

3) if (number_satisfied(x, n) == n) then report that the set of clauses is satisfiable else report that the set of clauses is not satisfiable

Page 4: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Function number_satisfied() can be computed in linear time, some savings might be achieved by only keeping track of the changes between rounds. However, computation time is not so much the issue here, the main point is answering the question how large T must be taken to be reasonably sure that this Monte­Carlo algorithm gives the correct answer for satisfiable systems. There is a chance of error and we need to bound that.

Monte Carlo vs. Las Vegas methods

A Las Vegas algorithm is a randomized algorithms that always return the correct result. The only variant is that it’s running time might change between executions.

The QuickSort algorithm is an example for a Las Vegas algorithm

A Monte Carlo algorithm is a randomized algorithm that might output an incorrect result. However, the probability of error can be diminished by repeated executions of the algorithm.

The MinCut algorithm is an example of a Monte Carlo algorithm.

This algorithm will always give correct non­satisfiable instances of ф. But if ф is satisfiable then we need to fix value of T such that this algorithm says “not satisfiable “ with a probability <=1/4.

If ф is satisfiable, there is a satisfiable assignment to the xi. Let us call it S. Let A be the current assignment of our algorithm which differs from S in at least one variable.

Let i = the number of variables having truth values same as in A and S.

Now by flipping the variable in A, we progress by making i = i+1 with probability atleast 1/2.

The above process can be modeled as a random walk on a line graph with n nodes: node y is connected to node y ­ 1 and node y + 1, as far as these indices are at least 1 and at most n.

Page 5: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Random Walk

Let G = (V,E) be a connected, undirected graph. A random walk on G, starting from vertex s ∈ V , is the random process defined as follows which is obviously trivial to implement.

1. u := s

2. repeat for T steps

choose a neighbor v of u uniformly at random.

u := v

Consider a particle moving in an one­dimensional line. At each point in time, the particle will move either1 step to the right with probability p or 1 step to the left with probability 1 − p. Let A 2 0,1n be any satisfying assignment

With probability at least ½ distance to A is reducedWith probability at most ½ distance to A is increased

Randomized 2­SAT Analysis

Distance can never be larger than n if it starts at some 0 < i < n.

Dominated by a walk where

With probability exactly ½ distance to A is reduced

With probability exactly ½ distance to A is increased

Let us define a random variable

Xi=# of steps to reach state n starting from state i.

Xi=1 + # of steps to reach state n starting from state i+1 with probably ½ (Xi+1)

Xi=1 + # of steps to reach state n starting from state i­1 with probably ½ ( Xi­1)

Since this is a memory less property, start from the current state is a fresh start.

0 n

Page 6: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

E[Xi ]=1/2 E[1+Xi+1 ]+1/2E[1+ Xi­1 ]

= 1+(E[ Xi­1 ]+E[Xi+1])/2

Let Si=E[ Xi ] then

Si=1+( Si­1+Si+1)/2

Sn­1=1+( Sn­2 + Sn)/2

S0=S1+1 ............(1)

2S1 =S0 +S2 +2...............(2)

.

.

2Sn­1 =Sn­2 +2..............(n)

adding equations 1 to n we get

Sn­1 =2(n­1)+1=2n­1

Sn­2 =(2n­1)+(2n­3)

.

S0 =(2n­1)+(2n­3)+..............+3+1=n2

Let us define another random variable Y

Y= # of variables whose truth value is same in current assignment A & satisfiable assignment S

Movement of Y is similar to

Page 7: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

i­1 <­­­­­­­­ i ­­­­­­­> i+1

Now we have to fix T. For this we make use of Markov Inequality.

Markov InequalityMarkov's inequality gives an upper bound for the probability that a non­negative function of a random variable is greater than or equal to some positive constant

Proposition :For any non­negative random variable Y and any real number k >0, Pr [Y>=k] <= E [Y] / k As an example let k = 2*E[ Y ]. Then the above says Pr [Y>= 2 *E[ Y ] ] <= 1/2. Namely, if you move out to twice the expectation, you can have only half the area under the curve to your right. This is quite intuitive.

Let Y be any positive random variable, then Pr[Y>=k*E[Y]] <= 1/k

Proof for Markov Inequality

E[Y]=∑ Pr[Y=y]*y

= ∑ Pr[Y=y]*y+∑ Pr[Y=y]*y y<k y>=k

>= 0 + k* Pr [ Y>=k ] =Pr [ Y>=k ]<=E [ Y ] / k

Therefore T=4n2

Page 8: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Pr[ is satisfiable but algorithm returns unsatisfiable] φ

=Pr[our algorithm does not reach state n in 4n2 steps]

= Pr[ Z>=4E[Z]] <= ¼

Z=# of steps to reach state n from our start state

E[Z]<= n2

Page 9: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

3 CNF SAT

A special case of SAT that is incredibly useful in proving NP­hardness results is 3SAT (or 3­CNF­SAT). A boolean formula is in conjunctive normal form (CNF) if it is a conjunction (and) of several clauses, each of which is the disjunction (or) of several literals, each of which is either a variable or its negation.

For example:

(A1 V A2 V A3 ) (ĀΛ 1 V Ā3 V A4) (AΛ 1 V Ā2 V A3 )

Given such a boolean formula, can we come up with an algorithm, that is polynomial in time ? The answer to this question is NO!!! Hence this is NP hard problem.(Proof is given in the Appendix)

Max 3­CNF This problem is to find an assignment which maximizes the number of satisfiable clauses.

Let us take for example(A1 V A2 V Ā3) ^ (A1 V Ā2 VA4) ^ ( A2 V Ā3 V A4)

Randomized Algorithm for Max 3­CNFSet each variable to true with probability 1/2 independently. (for instance, toss the coin and if head, set the variable to true and if tail, set the variable to false).If a clause is not satisfied, this means all the 3 variables in the clause are false.

Prob. that a clause is not satisfied = 1/2 * 1/2 * 1/2 (since each variable is independently set to false)Prob. that a clause is satisfied = 1 ­ Prob. that a clause is not satisfied = 1 ­ 1/8 = 7/8

Page 10: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

There may be dependent clauses too. Eg. (A1 V A2 V A3) ( ĀΛ 1 V A2 V A3)

In the above two clauses the first clause is true and hence the second clause is bound to be true if x2 = true or x3 = true or both are true. Each clause is satisfied with prob. 7/8Let m be the no. of clauses. Expected number of satisfiable clauses can be found as follows.Let Xi be a random variable.

Xi = 1 if ith clause is satisfied. = 0 otherwise

X = ∑i Xi

Expected value of X E[X] = E[X1 + X2 + ... + Xm]

Linearity of expectation: Let r be any real number and let X and X1 , . . . , Xn be random variables on a discrete probability space such that their expectations all exist. Then the expectation of rX andΩ of X1 +. . .+Xm exists and we have

E [rX] = rE [X] and E [X1 + . . . + Xn ] = E [X1 ] + . . . + E [Xn ] .

Since linearity of expectations always holds,E[X] = E[X1] + E[X2].........+ E[Xm] = 7/8 + 7/8 + ............+ 7/8 = 7/8* m

Page 11: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Approximation Algorithm

Approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP­hard problems; since it is unlikely that there can ever be efficient polynomial time exact algorithms solving NP­hard problems, one settles for polynomial time sub­optimal solutions.

This is termed as 7/8th approximation algorithm because, if the MAX­3SAT is

satisfiable, then the expected weight of the assignment found is at least 7/8 of optimal

clauses.

De randomization : First devise a randomized algorithm then argue that it can be derandomized to yield a deterministic algorithm

Consider the following formula.

(A1 V Ā2 V A3) ( ĀΛ 2 V Ā3 V A1) ( ĀΛ 3 V A2 V Ā1 )

The assignment for the literals in the above clauses can be drawn in the form of a binary

tree.

Page 12: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Assignment of the Variables

At each leaf node, the number of clauses satisfied by the assignment of the variables is shown on selection of the particular path towards that leaf. If n variables, then the tree will consist of n levels.

If n levels are there, 2n leaves are there. Thus 2n assignments are possible. The average of the numbers on the leaves = 7/8 * m

In 7/8 of the leaves, the clause will be contributing for appearing in the leaf. There are

2n leaves. Each clause contributes 1 to 7/8 * 2n leaves

Sum of the numbers on leaves = 7/8 * 2n * m = 21 (Since m = 3 in the example here)

Avg. of the numbers on leaves = (7/8 * 2n * m)/ 2n = 21/8=2.65

A1=1A1=0

A2=1A2=0A2=0 A2=1

A3=0 A3=1 A3=0 A3=1A3=0 A3=1

A3=0 A3=1

Page 13: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Here, starting from the root, first compute the average at each node. Then pick the path having greater average. The process for computing the average is Consider the root assignment A1=0, A1 = 1

if A1 = 1 , compute the satisfiable clauses probability.

i.e. (A1 V Ā2 V A3) ... satisfied.

( Ā2 V Ā3 V A1) ... satisfied

( Ā3 V A2 V Ā1 ) ... not satisfied.

By discarding Ā1 from the clause we obtain ( A2 V Ā3 ).

The probability that this clause is not satisfied is 1/4. So the probability that this clause

is satisfied is ¾ . Thus the total probability turns out to be (1 + 1 + ¾) =1+1/4. This is

the average when A1 = 1. Similarly compute the average for A1 = 0. which comes out

to be 10/4. Choose the greater average and move towards that path. So we move towards

A1 = 1 path. Now compute the average by taking the assignment A2=0 and A2=1 and

move towards the path with greater average. Thus at each level i, we have to compute

the average, with assignment true, and with false. Thus, on reaching the leaf, we would

have the average number of clauses which will be satisfiable with the assignment on the

chosen path.

In the above algorithm greedy approach is followed. At each level we check which sub

tree will give the best average & we take decision according to the current maximum.

Thus algorithm may result in an sub­optimal result.

Page 14: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

References

Randomized Algorithmes: R.Motwani Chapters 1,5,6,10

Introducton to Algorithms: CLRS Chapter 5

http://www.nada.kth.se/~viggo/problemlist/compendium.html

Page 15: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

APPENDIXPractical Example of SatisfiabilityCircuit satisfiability is a good example of this problem that we don't know how to solve in polynomial time. In this problem, the input is a boolean circuit: a collection of and, or, and not gates connected by wires. We will assume that there are no loops in the circuit (so no delay lines or ip­ops). The input to the circuit is a set of m boolean (true/false) values x1; : : : ; xm. The output is a single boolean value. Given specific input values, we can calculate the output in polynomial (actually, linear) time using depth­first­search and evaluating the output of each gate in constant time.

Why NP hard ?

The circuit satisfiability problem asks, given a circuit, whether there is an input that makes the circuit output True, or conversely, whether the circuit always outputs False.

Nobody knows how to solve this problem faster than just trying all 2m possible inputs to the circuit, but this requires exponential time. On the other hand, nobody has ever proved that this is the best we can do; maybe there's a clever algorithm that nobody has discovered yet! Hence this comes under NP hard problem.

We could prove that 3SAT is NP­hard by a reduction from the more general SAT problem,but it's easier just to start over from scratch, with a boolean circuit. We perform the reduction in several stages.

1. Make sure every and and or gate has only two inputs. If any gate has k > 2 inputs, replace it with a binary tree of k­1 two­input gates. 2. Write down the circuit as a formula, with one clause per gate. This is just the previous reduction.3. Change every gate clause into a CNF formula. There are only three types of clauses, one for each type of gate:

A1 = A2 AΛ 3 ­> (A1 V Ā2 V Ā3 ) (ĀΛ 1 V A2 ) (ĀΛ 1 V A1)

A1 = A2 V A3 ­> ( Ā1 V A2 V A3) (AΛ 1 V Ā2 ) ( AΛ 1 V Ā3 )

Page 16: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

A1 = Ā2 ­> (A1 VA2 ) ( ĀΛ 1 V Ā2 )

4. Make sure every clause has exactly three literals. Introduce new variables into each one­ and two­literal clause, and expand it into two clauses as follows:

A1­> ( A1 V e V u) (AΛ 1V ē V u) (AΛ 1 V e V ū) (AΛ 1 V ē V ū )

A1 V A2 ­> (A1V A2 V e) (AΛ 1 V A2V ē )

If we start with the above example formula , we obtain the following 3CNF formula.

Page 17: Topic :(2SAT & MAX3SAT) Naveen Gargnaveen/courses/CSL758/scribe_SAT.pdf · 2008-04-03 · Topic :(2SAT & MAX3SAT) Naveen Garg Scribe : Sandhya S. Pillai(2007MCS3120) Suita Sharma(2007MCS2927)

Although the 3CNF formula is complicated than the original one at first glance, it's actually only a constant factor larger. Even if the formula were larger than the circuit by

a polynomial, like n373, we would have a valid reduction.

The formula is satisfiable if and only if the original circuit is satisfiable. As with the more general SAT problem, the formula is only a constant factor larger than any reasonable description of the original circuit, and the reduction can be carried out in polynomial time. Thus, we have a polynomial­time reduction from circuit satisfiability to 3SAT

TCSAT(n) <= O(n) + T3 SAT(O(n)) => T3 SAT(n) >= TCSAT(( (n)) ­ O(n)ᾨ

So 3SAT is NP­hard. Finally, since 3SAT is a special case of SAT, it is also in NP, so 3SAT is NP­complete.