When I started writing this post, I hadn’t found so many “useful links”
yet. The content below and link 1.2. only focus on using the Lasserre
hierarchy on LPs, while 3.4.5.6. mention more general cases.
1
We want to solve a 0-1 integer program. Since this task is NP-hard in
general, we usually consider its linear relaxation. Different LP
formulations have different integrality gaps. For example, consider the
following linear relaxation of the max matching IP in non-bipartite
graph.
:(, this polytope
is not integral. Edmonds proved that the following formulation is
integral.
Schrijver [2]
showed that those odd constraints can be obtained by adding cutting
planes to the previous polytope. Fortunately for matching polytope we
have a polynomial time separation oracle. However, for harder problems
adding cutting planes may make the program NP-hard to solve. Lasserre
hierarchy is a method to strengthen the polytope to approaching the
integer hull while providing provable good properties and keeping the
program polynomial time solvable (if applied constant number of times).
2 Probability Perspective
There is a good interpretation of the linear relaxation of 0-1 integer
programs. Let
be the polytope of the linear relaxation. The goal of solving the
integer program is to describe all possible discrete distribution over
.
Note that for a fixed distribution the expected position
is in
and iterating over all possible distribution gives us the integer-hull.
Hence we can find the integral optimal solution if having access to all
distribution over integer points.
For any
,
can be seen as the probability of
.
We only care about the discrete distribution on feasible integral
points. However, each
only describes some marginal probabilities and this this marginal
probability may not be even feasible. Consider the following 2D example.
Any point in
is not a marginal distribution of any possible joint distribution over
and
.
The idea is to iteratively prune this area.
Now we need to think about how to represent all possible joint
distribution. One natural way is to use a vector
for the distribution law of every possible integer point in
.
However, this method does not work well with our existing marginal
probabilities. Let
be a random vector such that
and
.
Computing all feasible
is the same as finding all possible bivariate discrete distribution on
the integer points. To make
a feasible probability from some joint distribution and to make
we have to add more constraints.
2.1 Feasible Probability
For now let’s forget about
and consider
and a discrete distribution
on
.
We want to make
.
In fact, there is a one to one correspondence between
and
.
If
is given, computing
is easy for any
.
If
is given, recovering the distribution
is the same as solving a system of
linear equations with
variables
(
of the the equations come form
,
and the remaining one is
.)
Thus, with a slight abuse of notation, we will refer to
as a distribution.
We work with the 2D example first. Let
be a marginal distribution. One can see that
and the last number is not arbitrary. In fact,
must in range
.
To make sure
is indeed a probability distribution the moment matrix is considered.
The moment matrix
is of size
and
is defined as the expectation
(the only non-zero term is
).
The expectation is taken over the distribution defined by
.
Lemma1
For any probability distribution
,
the moment matrix is psd.
Proof
We need to verify
for any
.
Note that in the proof something like sum of squares appears. Lasserre
hierarchy has deep connections with SOS
optimization and is also known as sum-of-squares hierarchy.
Lemma2
If
is psd then
is a probability distribution.
It is easy to see that
for all
.
Consider the following submatrix
It is psd since
is psd. The determinant is
.
Let
be the probability of selecting
in
.
It remains to prove the following system of linear equations has a
solution such that
for all
.
I believe this can be proven
with the idea of Lemma 2 here.
2.2 Projection in
Let the projection of
be
.
For any
the projection should always lie in
.
One may want to define moment matrices for constraints
.
This is called the moment matrix of slacks. For simplicity we only
consider one linear constraint
.
The moment matrix for this constraint is
.
Then we can do similar arguments.
Note that we can combine the expectations since they are taken over the
same probability distribution. Now assume that we have
.
If
is satisfied, then the corresponding slack moment matrix is psd.
Finally, this is a more formal definiton.
Definition3-th
level of Lasserre hierarchy
The
-th
level of Lasserre hierarchy
of a convex polytope
is the set of vectors
that make the following matrices psd.
moment matrix
moment matrix of slacks
Note that the
-th
level of Lasserre hierarchy only involve entries
with
.
(
comes from the moment matrix of slacks) The matrices have dimension
and there are only
matrices. Thus to optimize some objective over the
-th
level of Lasserre hierarchy takes
time which is still polynomial in the input size. (The separation oracle
computes eigenvalues and eigenvectors. If there is a negative eigenvalue
we find the corresponding eigenvector
and the separating hyperplane is
.
See Example
43.)
3 Properties
Almost everything in this part can be found here.
Suppose that we have the
-th
level of Lasserre hierarchy
.
Denote by
the projection of the
-th
level.
is convex
for all
for all
with
for all
1.2.3. show that
behaves similarly to a real probability distribution.
4.5.6. show that
.
The goal of this section is to show that
.
When working on the Lasserre hierarchy, instead of considering the
projection
solely, we usually perform the analysis on
.
3.1 Convex Hull and Conditional
Probability
Lemma4
For
,
let
and
be any subset of variables of size at most
.
then
For any
and
,
the previous lemma implies the projection of
is convex combination of integral vectors in
.
Then it follows that
.
This also provides proofs for the facts that if
and
then
is indeed a probability distribution and the projection is in
.
Proof
The proof is constructive and is by induction on the size of
.
.
Assume that
.
For simplicity I use
for
.
Define two vectors
as
and
.
One can easily verify that
and
.
It remains to verify
.
Since
is psd, there must be vectors
such that
for all
.
Take
.
We have
for all
.
Thus
is psd. Similarly, one can take
and show
is psd. For each moment matrix of slacks one can use exactly the
same arguments to show
and
.
For the inductive steps one can see that our arguments for the base
case can be applied recursively on
.
is a probability distribution if we consider only
,
.
The vectors
we constructed in the previous proof can be understood as conditional
probabilities.
The proof is basically showing that
For any partially feasible probability distribution
,
implies that both
and
happen with non-zero probability, which in turn impies
.
One can also explicitly express
as convex combination and see the relation with Möbius inversion, see p9
in this
notes.
In Lemma 4, each vector in the
convex combination (those with integer value on
,
such as
)
can be understood as a partial probability distribution under condition
where
,
and the probability assigned to it is exactly the chance its condition
happens. More formally, Lemma 4
implies the following,
Corollary5
Let
.
For any subset
of size at most
,
there is a distribution
over
such
Moreover, this distribution is “locally consistent” since the
prabability assigned to each vector only depends on its condition.
Since the constraints in
only concern the psdness of certain matrices, one may naturally think
about its decomposition. This leads to a vector representation of
for all
and may be helpful in rounding algorithms. For
,
lies on the sphere of radius
and center
.
3.2 Decomposition Theorem
We have seen that
is the integer hull. Can we get better upperbounds based on properties
of
?
Another easy upperbound is
,
where
.
This is because
is a partial distribution for
that can be realized as the marginal distribution of some distribution
on
;
if
does not contain a point with at least
ones, we certainly have
for
.
This fact implies that for most hard problems we should not expect
to give us a integral solution for constant
.
Karlin, Mathieu and Nguyen [3]
proved a more general form of Lemma
4 using similar arguments.
Theorem6Decomposition
Theorem
Let
,
and
such that
for all
.
Then
4 Moment Relaxation
In this section we briefly show the non-probabilistic view of Lasserre
hierarchy and how this idea is used in polynomial optimization problems.
Everything in this section can be found in useful_link[6].
Consider the following polynomial optimiation problem where
are polynomials. We want to formulate this problem with SDP.
We can consider polynomials
as multilinear polynomials. Since
,
we have
.
Now we can consider enumerating
and write these polynomials as linear functions. For example, we can
rewrite
as
which is linear in the moment sequence
.
Recall that our goal is to find a SDP formulation. A common technique is
replace each variable with a vector. We consider the moment vectors
.
Similar to the LP case, we want
.
This is exactly the Gram decomposition of the moment matrix. There exist
such moment vectors iff the moment matrix is psd. For
,
we consider the slack moment matrix
Then the program becomes the following SDP
Note that if the max degree of polynomials
is at most
,
then the following program is a relaxation of the original polynomial
optimiation problem (cf. Corollary
3.2.2.).
where
,
is element-wise union and
is the submatrix of
on entries
.
Taking
gives us
.
5 Applications
5.1 Sparsest Cut
There are lots of applications in the useful links, but none of them
discusses sparsest cut [4].
Problem7sparsest cut
Given a vertex set
and two weight functions
,
find
that minimizes the sparsity of
where
is the indicator vector of
.
In [4] Guruswami and Sinop describe Lasserre
hierarchy in a slightly different way. (Note that useful_link[6]
is Sinop’s thesis) We have seen that
is sufficient for describing the joint distribution. However, the total
number of events is
,
since for each variable
in an event there are 3 possible states,
and
is absent.
Instead of using
,
they enumerate each of the
events and consider the vectors in the Gram decomposition. For each set
of size
,
and for each 0-1 labeling
on elements of
,
they define a vector
.
Note that
enumerates all events and one should understand
as the vector corresponding to
in the Gram decomposition and
.
Then
should have the following properties:
if
and
are inconsistant, i.e. there is an element
and
,
then one should have
.
if
and
are the same event,
i.e.
and the labels are the same, then
here
is the union of all events.
for all
,
.
for
and
,
.
(Note that two lhs vectors are orthogonal)
Lemma8pseudo
probability
Let
for
.
Then the following holds:
for all
.
if
and
for all
.
if
.
If
,
and
,
then
.
Proof
Let
be the number of vectors in
.
Consider the moment matrix
,
where each entry
is
.
The moment matrix is positive semidefinite since vectors in
form a Gram decomposition of
.
Consider the following submatrix of
.
Computing
the determinant gives us
.
Again consider certain submatrix of
.
The
determinant is
.
We only need to show
and the rest follows by induction. Note that
since we have
and they are orthogonal.
Notice that
and
are orthogonal. Denote by
the projection of
on the hyperplane spanned by
and
.
One can verify that
.
Then it remains to show
,
which immediately follows from 3.
Then write
.
The follwing “SDP” is a relaxation of sparsest cut.
Scaling every
by a factor of the square root of the objective’s denominator gives us a
real SDP.
The rounding method is too complicated, so it won’t be covered here.
5.2 Matching
This application can be found in section 3.3 of useful_link[1].
We consider the maximum matching IP in non-bipartite graphs. Let
be the polytope and consider
.
In the notes Rothvoss shows the following lemma.
Lemma9
.
Proof
Let
.
It suffices to show that
for all
,
since
is the matching polytope. When
,
the degree constraints imply that
.
Now consider the case
.
Note that for fixed
,
any
of size
has
,
since it is impossible to find a matching in
covering more that
vertices. Then by Lemma 4
can be represented as a convex combination of solutions
in which
for all
.
The convex combination implies that
when
.
However, one can see that Lemma
9 is not tight.
should be contained in
and
should be exactly the integer hull. Can we prove a slightly better gap
that matches observations at
and
?
The later part of the proof in fact shows that
satisfies all odd constraints with
.
Consider an odd cycle with
vertices.
is a feasible solution in
and proves a tight lowerbound of
.
6 Questions
6.1 Replace
with
I don’t see any proof relying on the psdness of slack moment
matrices…
It turns out that problems occur in the proof of Lemma 4. If
is defined as
,
then we cannot guarantee
.
Without Lemma 4,
may not be exactly
and the hierarchy seems less interesting? But an alternative formulation
(see Sparsest cut, which entirely ignore the
slack moment matrices) still allows good rounding even without Lemma 4. Generally speaking, if the
psdness of slack moment matrices is neglected, then we won’t have Law of
total probability(Lemma 4);
However, we still have “finite additivity property of probability
measures”(Lemma 8 (3)).
6.2 Separation Oracle for Implicit
Sometimes
is given in a compact form. For example, consider finding matroid
cogirth.
If
is only accessable through a separation oracle, is it possible to
optimize over
in polynomial time for constant
?
References
[1]
M.
Laurent, A Comparison of the
Sherali-Adams,
Lovász-Schrijver, and LasserreRelaxations for 0–1 Programming,
Mathematics of Operations Research. 28 (2003) 470–496 10.1287/moor.28.3.470.16391.
[2]
A.
Schrijver, Polyhedral proof methods in combinatorial optimization,
Discrete Applied Mathematics. 14 (1986) 111–133 10.1016/0166-218X(86)90056-9.
[3]
A.R. Karlin, C. Mathieu, C.T. Nguyen,
Integrality Gaps of Linear and
Semi-DefiniteProgrammingRelaxations for Knapsack, in: Integer
Programming and CombinatoralOptimization, Springer, Berlin, Heidelberg, 2011: pp.
301–314 10.1007/978-3-642-20807-2_24.
[4]
V.
Guruswami, A.K. Sinop, Approximating non-uniform sparsest cut via
generalized spectra, in: Proceedings of the Twenty-Fourth Annual
ACM-SIAM Symposium on Discrete
Algorithms, Society for Industrial and Applied
Mathematics, USA, 2013: pp. 295–305.