By adamant, history, 7 months ago,

Hi everyone!

Previously I wrote about theoretical grounds of "aliens trick". Specifically, I introduced the concept of the Lagrangian duality and explained its connection with the trick. Today I want to elaborate a bit more on the concept of dual problems and their applications to linear programming as well as to common problems in competitive programming.

I initially wanted to also write about some applications in competitive programming problems, but given that the introduction is already quite lengthy, I decided to leave it for another blog, while using most common and well-known theoretical examples here, focusing more on how to construct and interpret dual problems to begin with, rather than how to use it in contests.

I think, it is a crucial first step towards using the duality in actual programming challenges.

#### Prerequisites

It is highly recommended to have some general understanding of basic mathematical optimization concepts (e.g. convexity, Lagrange multipliers), as well as basic understanding of flow algorithms and most well-known results around them, such as cycle-path decomposition of flows, maximum flow / minimum cut duality, minimum cost flows (ideally, the algorithm that computes it using Dijkstra with potentials). The article should hopefully shed some further light on these topics.

#### General Lagrangian duality

I wrote a long and elaborate story about general Lagrangian duality. It should provide you with additional insights and interpretation on what's going on if you, like me, like to start learning new concepts from abstract nonsense rather than specific examples. However according to some feedback such approach would rather scare anyone else, so I put the scary stuff under the spoiler tag.

Among all the stuff above, I'll just briefly mention the most important things. For an optimization problem

$\begin{gather} \min\limits_{x \in X} & f(x) \\ s.t. & g(x) \geq 0, \end{gather}$

where $f: X \mapsto \mathbb R$ is the objective function and $g: X \mapsto \mathbb R^c$ is the constraint function, the dual problem is

$\begin{gather} \max\limits_{\lambda \in \mathbb R^c} & t(\lambda) \\ s.t. & \lambda \geq 0, \end{gather}$

where $t(\lambda) = \min\limits_{x \in X} [f(x)-\lambda^\top g(x)]$ is the Lagrangian dual function and $\lambda^\top g(x)$ denotes the dot product of $\lambda$ and $g(x)$.

Correspondingly, for the maximization problem

$\begin{gather} \max\limits_{x \in X} & f(x) \\ s.t. & g(x) \geq 0, \end{gather}$

its dual is

$\begin{gather} \min\limits_{\lambda \in \mathbb R^c} & t(\lambda) \\ s.t. & \lambda \geq 0, \end{gather}$

where $t(\lambda) = \max\limits_{x \in X} [f(x)+\lambda^\top g(x)]$. Note that the sign has changed.

Let $x^*$ be the solution for the primal problem and $\lambda^*$ be the solution for the dual problem. Then for the minimization problem it holds that $t(\lambda^*) \leq f(x^*)$ and for the maximization problem it holds that $t(\lambda^*) \geq f(x^*)$.

This connection between the solution to the primal and dual problem is called the weak duality. In linear programming specifically, as we will see, in a significant number of cases the strong duality $t(\lambda^*) = f(x^*)$ holds.

#### In linear programming

Def. 12. Let $c \in \mathbb R^d$, $b \in \mathbb R^c$ and $A \in \mathbb R^{c \times d}$. The primal linear programming problem is defined in canonical form as

$\begin{gather} \max\limits_{x \in \mathbb R^d} & c^\top x,\\ s.t. & Ax \leq b,\\ & x \geq 0. \end{gather}$

In other words, the objective function and the constraint functions in a linear programming problem are all linear functionals.

If we follow the approach above, we'd need to introduce the Lagrange multipliers $\lambda_1 \in \mathbb R^c$ and $\lambda_2 \in \mathbb R^d$. Then the Lagrangian is

$L(x,\lambda_1, \lambda_2) = c^\top x + \lambda_1^\top (b-Ax) + \lambda_2^\top x$

and the Lagrangian dual is

$t(\lambda_1, \lambda_2) = \max\limits_{x \in \mathbb R^d} [c^\top x + \lambda_1^\top (b-Ax) + \lambda_2^\top x] = \max\limits_{x \in \mathbb R^d}[\lambda_1^\top b + (c-A^\top\lambda_1 +\lambda_2)^\top x].$

When $A^\top\lambda_1 - c \neq \lambda_2$, this value can be arbitrarily large. Otherwise $t(\lambda_1, \lambda_2) = \lambda_1^\top b$. To make a problem a bit more regular and avoid dealing with $+\infty$, we place the $A^\top\lambda_1 - c = \lambda_2$ equality into constraints section.

After that $t(\lambda_1,\lambda_2)$ does not depend on $\lambda_2$ and we can completely remove it from the dual problem formulation replacing $A^\top\lambda_1 - c = \lambda_2$ equality with $A^\top\lambda_1 - c \geq 0$ inequality. Then, we can formulate the dual problem.

Def. 13. The dual linear programming problem is defined as

$\begin{gather} \min\limits_{\lambda \in \mathbb R^c} & b^\top \lambda,\\ s.t. & A^\top\lambda \geq c,\\ & \lambda \geq 0. \end{gather}$

In a similar way dual problems can be constructed for minimization problems and with different kinds of constraints. There is an alternative way to interpret the duality for LP problems specifically. If we multiply both parts of $A x \leq b$ with $\lambda^\top$, we get

$\lambda^\top A x \leq \lambda^\top b.$

At the same time $\lambda^\top A = (A^\top \lambda)^\top$, hence for $A^\top \lambda \geq c$ it holds that

$c^\top x \leq \lambda^\top A x \leq \lambda^\top b.$

This immediately shows that $\lambda^\top b$ provides an upper bound for $c^\top x$ and by minimizing $\lambda^\top b$, we make the bound as tight as possible.

##### Mnemonics to construct dual problems

On practice, it is not always possible to easily write LP problem in the needed form.

To mitigate this, one should keep in mind that there is generally a correspondence between:

1. Primal variables and dual constraints;
2. Primal constraints and dual variables.

This correspondence can be summed up in a table:

Maximization Minimization
Inequality constraint $\leq$ Nonnegative variable $\geq$
Inequality constraint $\geq$ Nonpositive variable $\leq$
Equality constraint $=$ Free variable
Nonnegative variable $\geq$ Inequality constraint $\geq$
Nonpositive variable $\leq$ Inequality constraint $\leq$
Free variable Equality constraint $=$

With the standard definition of LP duality it looks like

$\begin{gather} \max\limits_x & c^\top x & \color{red}{\text{(maximization)}} & \overset{\text{dual}}{\iff} & \min\limits_\lambda & b^\top \lambda & \color{red}{\text{(minimization)}} \\ s.t. & Ax \leq b & \color{red}{\text{(constraint }\leq\text{)}} & & s. t. & \lambda \geq 0 & \color{red}{\text{(variable }\geq\text{)}} \\ & x \geq 0 & \color{red}{\text{(variable }\geq\text{)}} & & & A^\top \lambda \geq c & \color{red}{\text{(constraint} \geq \text{)}} \end{gather}$

Here's an example that highlights all possible variations (source):

$\begin{gather} \max\limits_{x,y,z} & c^\top x + d^\top y + f^\top z & \overset{\text{dual}}{\iff} & \min\limits_{\lambda,\eta,\mu} & p^\top \lambda + q^\top \eta + r^\top \mu \\ s.t. & Ax+By+Cz \leq p & & s.t. & \lambda \geq 0 \\ & Dx + Ey + Fz \geq q & & & \eta \leq 0 \\ & Gx + Hy + Jz = r & & & \mu\text{ free} \\ & x \geq 0 & & & A^\top \lambda + D^\top \eta + G^\top \mu \geq c \\ & y \leq 0 & & & B^\top \lambda + E^\top \eta + H^\top \mu \leq d \\ & z\text{ free} & & & C^\top \lambda + F^\top \eta + J^\top \mu = f \end{gather}$

Finally, if the primal problem has explicit constraint on the variable (e.g. $x=0$), it doesn't produce a constraint in the dual problem.

##### Weirdly indexed variables and constraints

It often happens so that e.g. variables are indexed by two numbers (e.g. row and column) rather than just one, which might make the construction of the dual somewhat puzzling. What is the actual meaning behind $Ax \leq b$ and $A^\top \lambda \geq c$?

The $Ax \leq b$ inequality says that there are $n$ variables $x_1, \dots, x_n$ and $m$ constraints defined by rows of $A$ and $b_1, \dots, b_m$. In such terms, $i$-th constraints is a linear combination of variables which should be less or equal than $b_i$. Let the coefficient near $j$-th variable in $i$-th constraint be $a_{ij}$.

Now, $A^\top \lambda \geq c$ inequality says that there are $m$ dual variable $\lambda_1, \dots, \lambda_m$ and $n$ constraints defined by columns of $A$ and $c_1, \dots, c_n$. Specifically, $j$-th constraint collects $a_{ij}$ and places it near $\lambda_i$. In other words, the $j$-th constraint in the dual problem collects together all the coefficients that were near $j$-th variable in each of primal constraints.

Let's practice it on some concrete examples.

#### Shortest paths as linear programming

As a simpler example, we start off with the shortest path problem. Given a weighted directed graph, one needs to find the length of the shortest path from $s$ to all the other vertices. We assume that there are $n$ vertices and the weight of the edge from $i$ to $j$ is denoted by $c_{ij}$. If there is no such edge, we set $c_{ij} = +\infty$.

Let $d_i$ symbolize the length of the shortest path from $s$ to $i$. Then the primal problem could be formulated as:

$\boxed{\begin{gather} \max\limits_{d_1,\dots,d_n \in \mathbb R} & d_1 + \dots + d_n,\\ s.t. & d_j \leq d_i + c_{ij}, & (\forall i, j) \\ & d_s = 0. \end{gather}}$

On one hand, each of $d_i$ can't be greater than the length of the shortest path from $s$ to $i$. Indeed, let $s = i_0 \to i_1 \to i_2 \to \dots \to i_k = i$ be the shortest path. Then the inequalities imply that $d_s \geq d_i - c_{i_0 i_1} - \dots - c_{i_{k-1} i_k}$. If $d_i$ is greater than the sum which is subtracted from it, it would mean that $d_s > 0$ and would violate $d_s = 0$ constraint.

On the other hand, setting all $d_i$ to the actual lengths of the shortest path would satisfy all constraints, hence deliver the maximum sum.

##### Constructing dual

For the dual problem we need to introduce dual variables $x_{ij}$ corresponding to the $d_j - d_i \leq c_{ij}$ constraint.

In this way, the target function would be

$\lambda^\top c = \sum\limits_{i,j} x_{ij} c_{ij}.$

The coefficient near $x_{ij}$ in the $k$-th constraint would be equal to the coefficient near $d_k$ in the $d_j - d_i \leq c_{ij}$ constraint in the primal problem. That being said, we take the coefficient $1$ near $x_{ik}$ for every $d_k - d_i \leq c_{ik}$ constraint and we take the coefficient $-1$ near $x_{kj}$ for every $d_j - d_k \leq c_{kj}$ constraint, and the coefficient $0$ from all other constraints. This results into the following dual problem:

$\boxed{\begin{gather} \min\limits_{x_{ij} \in \mathbb R} & \sum\limits_{i,j} x_{ij} c_{ij}, & \\ s.t. & \sum\limits_{i=1}^n x_{ik} - \sum\limits_{j=1}^n x_{kj} = 1 & (\forall k \neq s) \\ & x_{ij} \geq 0. &(\forall i,j) \end{gather}}$

Note that, because of $d_s=0$ constraint in the primal problem, we only add constraints for $k \neq s$ in the dual problem. At the same time, because all the other primal variables were free, all the constraints in the dual problem are equality-like.

To interpret this problem, we could say that $x_{ij}$ denotes the number of times the edge $i \to j$ was picked into a network formed by the combination of all shortest paths from $s$ to all vertices in the graph. The $= 1$ constraints mean that every vertex (except for $s$) must have one more incoming edge than outgoing ones.

In terms of flows it would mean that we used $s$ as a source vertex and every other vertex is connected to the sink with an edge that has zero cost and capacity $1$ while original graph edges have a cost of $c_{ij}$ for traversing them, but infinite capacities. Then, $x_{ij}$ is the minimum cost flow in such network.

#### Maximum flow as linear programming

Let's recall the definition of a flow. Let $(V, E)$ be a network with $s, t \in V$ being the source and the sink. Each edge $i \to j$ has a capacity $q_{ij} \geq 0$. Then the flow is a function on the edges $f_{ij}$ such that:

1. The flow on the edge is non-negative and doesn't exceed its capacity (capacity constraint): $0 \leq f_{ij} \leq q_{ij}$;
2. The sum of incoming flow is equal to the sum of outgoing flow for any vertex except $s$ and $t$ (flow conservation):
$\sum\limits_{i=1}^n f_{ik} = \sum\limits_{i=1}^n f_{ki}.$

In this terms, we can define a net flow $b_k$ on the vertex $k$ as the difference between outgoing and incoming flows in this vertex. The definition above essentially says that the net flow of all vertices except $s$ and $t$ should be $0$. Let $f$ be the net flow in the vertex $s$, from the flow conservation it follows that the net flow of $t$ is $-f$. In this terms the maximum flow problem is

$\boxed{\begin{gather} \max\limits_{f_{ij}, b_i \in \mathbb R} & b_s,&\\ s.t.& 0 \leq f_{ij} \leq q_{ij}, & (\forall i,j) \\ & \sum\limits_{j=1}^n f_{kj} - \sum\limits_{i=1}^n f_{ik} = b_k,&(\forall k) \\ & b_k = 0,& (\forall k \not\in \{s, t\}) \\ & b_t = -b_s. \end{gather}}$
##### Constructing dual

Each constraint in the primal problem will correspond to a variable in the dual problem. There are three types of variables, let $\lambda_{ij}$ be a variable corresponding to $f_{ij} \leq q_{ij}$ constraint, $\mu_k$ be the variables corresponding to flow conservation constraints and let $\alpha$ be the variable corresponding to $b_s=-b_t$ constraints.

Then, the target function is

$\sum\limits_{i, j} \lambda_{ij} q_{ij}.$

As for the dual constraints, the constraint corresponding to the primal variable $f_{ij}$ would look like

$\lambda_{ij} + \mu_i - \mu_j \geq 0,$

getting $\lambda_{ij}$ from $f_{ij} \leq q_{ij}$ and $\mu_i,-\mu_j$ from flow conservation constraints with $k=i$ and $k=j$.

As for the primal variable $b_k$, it will only generate dual constraints for $k=s$ and $k=t$, specifically:

$\begin{cases} \alpha - \mu_s = 1,\\ \alpha - \mu_t = 0. \end{cases}$

Here, $\alpha$ comes with coefficients from $b_t = -b_s$ and $\mu_s, \mu_t$ come with coefficients from flow conservation constraints. Finally, $1$ from the first equality comes from the target function $b_s$. Note that $\alpha$ and $\mu_k$ are free variables, as they come from equality constraints, while $\lambda_{ij}$ must be non-negative. This allows to get rid of $\alpha$ altogether substituting two constraints above with $\mu_t = \mu_s+1$.

Getting it all together, the dual to the maximum flow looks like

$\begin{gather} \min\limits_{\lambda_{ij}, \mu_i \in \mathbb R} & \sum\limits_{i,j} \lambda_{ij} q_{ij},&\\ s.t.& \lambda_{ij} \geq \max(0, \mu_j - \mu_i), & (\forall i,j) \\ & \mu_t = \mu_s + 1. \end{gather}$

Note that increasing all $\mu_k$ by the same value will not change anything, so we may stick to $\mu_s=0$ and $\mu_t=1$. Another useful observation is that, since $q_{ij}$ are by definition non-negative it is always good to make $\lambda_{ij}$ as low as possible, either saturating the inequality $\lambda_{ij} \geq \mu_j - \mu_i$ or making $\lambda_{ij} = 0$. With this observations, we may further simplify the dual problem and rewrite it as

$\boxed{\begin{gather} \min\limits_{\mu_i \in \mathbb R} & \sum\limits_{i,j} q_{ij} \max(0, \mu_j - \mu_i),&\\ & \mu_s = 0, \mu_t = 1, \end{gather}}$

Thus getting rid of $\lambda_{ij}$ altogether. Then, it can be proven that in this special case, the optimal solution only have $\mu_k \in \{0,1\}$, forming a cut between sets $S = \{k : \mu_k = 0\}$ and $T = \{k : \mu_k = 1\}$. Since $s \in S$ and $t \in T$, the dual problem indeed looks for the minimum cut between $s$ and $t$.

#### Min-cost flow

Now assume that alongside the capacity $q_{ij}$, each edge also has a cost $c_{ij}$ for one unit of flow on the edge. Then, you want to find a flow of value $b$, while maintaing its minimum cost. The primal problem for such case is formulated as

$\boxed{\begin{gather} \min\limits_{f_{ij} \in \mathbb R} & \sum\limits_{i,j} f_{ij} c_{ij}, \\ s.t. & 0 \leq f_{ij} \leq q_{ij}, &(\forall i,j) \\ & \sum\limits_{j=1}^n f_{kj} - \sum\limits_{i=1}^n f_{ik} = b_k, &(\forall k) \\ & b_s = b, b_t = -b, \\ & b_k = 0. & (\forall k \not \in \{s, t\}) \end{gather}}$
##### Constructing dual

We will need two types of dual variables. Specifically, $\lambda_{ij}$ for each $f_{ij} \leq q_{ij}$ constraint and $\pi_k$ for each flow conservation constraint.

The target function then will be

$\sum\limits_{i, j} q_{ij} \lambda_{ij} + \sum\limits_k \pi_k b_k.$

Variables $b_k$ are, in fact, constant, hence will not generate any constraints. Each variable $f_{ij}$ will generate a single constraint:

$\lambda_{ij} + \pi_i - \pi_j \leq c_{ij}.$

That being said, the dual problem is

$\boxed{\begin{gather} \max\limits_{\lambda_{ij}, \pi_k \in \mathbb R} & \sum\limits_{i,j} \lambda_{ij} q_{ij} - b(\pi_t - \pi_s), \\ s.t. & \lambda_{ij} \leq \min(0, c_{ij} - \pi_i + \pi_j). &(\forall i,j) \end{gather}}$

Note that here $\lambda_{ij} \leq 0$, as we're considering dual to the minimization problem and it is generated by $\leq$-type constraint.

Since $q_{ij} \geq 0$ and we want to maximize target function, we would like to make $\lambda_{ij}$ as large as possible. Thus, it is always optimal to take

$\lambda_{ij} = \min(0, c_{ij} - \pi_i + \pi_j),$

which allows us to get rid of $\lambda_{ij}$ altogether, leaving us with the simplest version:

$\begin{gather} \max\limits_{\pi_k \in \mathbb R} & \sum\limits_{i,j} \min(0, c_{ij} - \pi_i + \pi_j) q_{ij} - b(\pi_t - \pi_s) \end{gather}$
##### Interpreting dual, its connection to Johnson's potentials

Let $c_{ij}^\pi = c_{ij} - \pi_i + \pi_j$ be the adjusted cost of the edge $i \to j$. Consider a path

$i_0 \to i_1 \to \dots \to i_k$

and sum up the adjusted weights on the path:

$c_{i_0 i_1}^\pi + \dots + c_{i_{k-1} i_k}^\pi = (c_{i_0 i_1} + \dots + c_{i_{k-1} i_k}) + \pi_{i_k} - \pi_{i_0}.$

In particular, for $i_0 = i_k$, when the path is a cycle, the total adjusted cost is same as total initial cost.

From this and the cycle-path decomposition theorem follows that the cost of any $b$-flow on adjusted costs would exceed the cost of same $b$-flow on initial costs by exactly $b(\pi_t - \pi_s)$, so we get rid of this excess by subtracting this value from the target function.

As for the first summand, it is comprised as if all negative-cost edges are taken into the flow, bounding the minimum-cost flow from below, so maximizing it would take it as close to the actual minimum-cost flow as possible.

On the other hand, for any particular minimum-cost flow, we may define $\pi_k$ to be the shortest path to $t$ from $k$ in the residual network of the $b$-flow. Since the flow is minimum, residual network does not have negative cycles and $\pi_k$ are all well-defined. Moreover, defined this way, for any edge $i \to j$ the following important properties hold:

1. If $f_{ij} < q_{ij}$ (i.e. the edge is not saturated), then $c_{ij}^\pi \geq 0$;
2. If $f_{ij} > 0$ (i.e. the edge has non-zero flow), then $c_{ij}^\pi \leq 0$;
3. If $0 < f_{ij} < q_{ij}$ (i.e. the edge is not saturated, but has non-zero flow), then $c_{ij}^\pi = 0$.

The first property is derived from the fact that $i \to j$ may participate in augmenting paths, hence $\pi_i \leq \pi_j + c_{ij}$. The second property is derived from the fact that $j \to i$ may participate in augmenting paths, hence $\pi_j \leq \pi_i - c_{ij}$. Combined, they yield the third property.

Thus, among edges that participate in the minimum cost flow, only saturated edges will possibly have (negative) non-zero adjusted cost and only they would contribute to the target function, meaning that the target function, indeed, equates to the minimum cost of the flow.

That being said, $\pi_k$ is the Johnson potential computed in the residual network of the minimum flow, but taken as a shortest path to $t$, rather than shortest path from $s$. Note that the same reasoning applies to potentials defined as negated lengths of shortest path from $s$ to $k$.

Another important caveat to mention is that the reasoning above works when all vertices are reachable from $s$ (or $t$ is reachable from all vertices). If some vertices are not reachable, it is still possible to compute potentials satisfying the properties above by adding edges of infinite cost and non-zero capacity between all vertices and the starting vertex.

#### Practical example

605C - Freelancer's Dreams. Statement as it is boils down to the following LP problem, given $a_1, \dots, a_n, b_1, \dots, b_n, p, q$:

$\begin{gather} \min\limits_{t_1,\dots,t_n \in \mathbb R} & t_1 + \dots + t_n, \\ s.t. & t_1 a_1 + \dots + t_n a_n \geq p, \\ & t_1 b_1 + \dots + t_n b_n \geq q, \\ & t_i \geq 0. \end{gather}$

Its dual is

$\begin{gather} \max\limits_{\lambda,\mu \in \mathbb R} & \lambda p + \mu q, \\ s.t. &\lambda a_i + \mu b_i \leq 1, &(\forall i) \\ &\lambda, \mu \geq 0. \end{gather}$

The optimal solution to the dual problem can be found with nested binary search, as there are only two variables involved.

This approach was shared by amiya in his comment to the contest editorial.