Multivariate optimisation of scalar fields

Unconstrained multivariate optimisation

Introduction

Optimisation with linear equality constraints

Single equality constraint

Constrained optimisation

Rather than maximise \(f(x)\), we want to maximise \(f(x)\) subject to \(g(x)=0\).

We write this, the Lagrangian, as:

\(\mathcal{L}(x,\lambda )=f(x)-\sum^m_k\lambda_k [g_k(x)-c_k]\)

We examine the stationary points for both vector \(x\) and \(\lambda\). By including the latter we ensure that these points are consistent with the constraints.

Solving the Langrangian with one constraint

Our function is:

\(\mathcal{L}(x, \lambda )=f(x)-\lambda [g(x)-c]\)

The first-order conditions are:

\(\mathcal{L}_{\lambda }= -[g(x)-c]\)

\(\mathcal{L}_{x_i}=\dfrac{\delta f}{\delta x_i}-\lambda \dfrac{\delta g}{\delta x_i}\)

The solution is stationary so:

\(\mathcal{L}_{x_i}=\dfrac{\delta f}{\delta x_i}-\lambda \dfrac{\delta g}{\delta x_i}=0\)

\(\lambda \dfrac{\delta g}{\delta x_i}=\dfrac{\delta f}{\delta x_i}\)

\(\lambda =\dfrac{\dfrac{\delta f}{\delta x_i}}{\dfrac{\delta g}{\delta x_i}}\)

Finally, we can use the following in practical applications.

\(\dfrac{\dfrac{\delta f}{\delta x_i}}{\dfrac{\delta g}{\delta x_i}}=\dfrac{\dfrac{\delta f}{\delta x_j}}{\dfrac{\delta g}{\delta x_j}}\)

Multiple equality constraints

Solving the Langrangian with many constraints

This time we have:

\(\mathcal{L}_{x_i}=\dfrac{\delta f}{\delta x_i}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_i}=0\)

\(\mathcal{L}_{x_j}=\dfrac{\delta f}{\delta x_j}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_j}=0\)

\(\dfrac{\delta f}{\delta x_i}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_i}=\dfrac{\delta f}{\delta x_j}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_j}\)

Linear programming

Inequality constraints

linear programming means of the form max \(c^Tx\) st. \(Ax<=b\) \(x>=0\) this is the canonical form

Lagrangians with inequality constraints

We can add constraints to an optimisation problem. These constraints can be equality constraints or inequality constraints. We can write constrained optimisation problem as:

Minimise \(f(x)\) subject to

\(g_i(x)\le 0\) for \(i=1,\dots,m\)

\(h_i(x)=0\) for \(i=1,\dots,p\)

We write the Lagrangian as:

\(\mathcal{L}(x, \lambda, \nu )=f(x)+\sum_{i=1}^m\lambda_i g_i(x)+\sum_{i=1}^p\nu_ih_i(x)\)

If we try and solve this like a standard Lagrangian, then all of the inequality constraints will instead by equality constraints.

Affinity of the Lagrangian

The Lagrangian function is affine with respect to \(\lambda\) and \(\nu\).

\(\mathcal{L}(x, \lambda, \nu )=f(x)+\sum_{i=1}^m\lambda_i g_i(x)+\sum_{i=1}^p\nu_ih_i(x)\)

\(\mathcal{L}_{\lambda_i}(x, \lambda, \nu )=g_i(x)\)

\(\mathcal{L}_{\nu_i}(x, \lambda, \nu )=h_i(x)\)

As the partial differential is constant, the partial differential is an affine function.

Primal and dual problems

The primal problem

We already have this.

The dual problem

We can define the Lagrangian dual function:

\(g(\lambda, \nu ) = \inf_{x\in X} \mathcal{L}(x, \lambda ,\nu )\)

That is, we have a function which chooses the returns the value of the optimised Lagrangian, given the values of \(\lambda\) and \(\nu\).

This is an unconstrained function.

We can prove this function is concave (how?).

The infimum of a set of concave (and therefore also affine) functions is concave.

The supremum of a set of convex (and therefore also affine) functions is convex.

Given a function with inputs \(x\), what values of \(x\) maximise the function?

We explore constrained and unconstrained optimisation. The former is where restrictions are placed on vector \(x\), such as a budget constraint in economics.

The dual problem is concave

The duality gap

We refer to the optimal solution for the primary problem as \(p^*\), and the optimal solution for the dual problem as \(d^*\).

The duality gap is \(p^*-d^*\).

Complementary slackness for linear optimisation

Farkas’ lemma

We have matrix \(A\) and vector \(b\).

Either:

\(Ax=b\); \(x\ge 0\)
\(A^Ty\ge 0\); \(b^Ty<0\)

Quadratic optimisation

The quadratic optimisation problem

Constrainted non-linear optimisation

Weak duality theorem

The duality gap (\(p^*-d^*\) is non-negative.

Lagrange multipliers

The dual problem for non-linear optimisation

The weak duality theorem

Constrained convex optimisation

Slater’s condition

Strong duality

Strong duality is where the duality gap is \(0\).

Slater’s condition

Slater’s condition says that strong duality holds if there is an input where the inequality constraints are satisified strictly.

That is they are \(g(x)<0\), not \(g(x)\le 0\)

This means that the conditions are slack.

This only applies if the problem is convex. That is, if Slater’s condition holds, and the problem is convex, then strong duality holds.

The strong duality theorem

Karush-Kuhn-Tucker conditions

If our problem is non-convex, or if Slater’s condition does not hold, how else can be find a solution?

A solution, \(p^*\) can satisify KKT conditions.

Sort

Unconstrained envelope theorem

Consider a function which takes two parameters:

\(f(x,\alpha\)

We want to choose \(x\) to maximise \(f\), given \(\alpha\).

\(V(\alpha )=\sup_{x\in X}f(x,\alpha )\)

There is a subset of \(X\) where \(f(x,\alpha )=V(\alpha )\).

\(X^*(\alpha )=\{x\in X|f(x, \alpha )=V(\alpha )\}\)

This means that \(V(\alpha )=f(x^*,\alpha )\) for \(x^*\in X^*\).

Let’s assume that there is only one \(x^*\).

\(V(\alpha )=f(x^*,\alpha )\)

What happens to the value function as we relax \(\alpha\)?

\(V_{\alpha_i}(\alpha )=f_{\alpha_i}(x^*(\alpha ),\alpha )\).

\(V_{\alpha_i}(\alpha )=f_x\dfrac{\delta x^*}{\delta \alpha }+f_{\alpha_i}\).

We know that \(f_x=0\) from first order conditions. So:

\(V_{\alpha_i}(\alpha )= f_{\alpha_i}\).

That is, at the optimum, as the constant is relaxed, we can treat the \(x^*\) as fixed, as the first-order movement is \(0\).

Identifying upper and lower bounds of linear programming

In min/max problem, any feasibly solution is an upper/lower bound.

can we get a bound at the other side? yes, by doing linear combinations of inequalities eg maximise \(30x + 100y\) subject to: \(4x + 10y <= 40\) \(x >=3\)

We can identify a lower bound by inputting something which works, for example \(x=3\) and \(y=0\). This gives us a lower bound of \(90\).

To get an upper bound we can manipulate the constraints: \(40x + 100y<=400\) \(10x>=30\) And then: \(40x+100y<=370+30\) \(40x+100y<=370+10x\) \(30x+100y<=370\)

So we have an upper bound of \(370\).

This lower bound is a result of doing linear combinations of the inequalities. For different combinations, we could have a lower lower bound.

This is the dual problem. How do we choose the linear combination of inequalities such that the resulting lower bound is minimised?

Hessian matrix

We can take a function and make a matrix of its second order partial derivatives. This is the Hessian matrix, and it describes the local curvature of the function.

If the function \(f\) has \(n\) parameters, the Hessian matrix is \(n\times n\), and is defined as:

\(H_{ij}=\dfrac{\delta^2 f}{\delta x_i \delta x_j}\)

If the function is convex, then the Hessian matrix is positive semi-definite for all points, and vice versa.

If the function is concave, then the Hessian matrix is negative semi-definite for all points, and vice versa.

We can diagnose critical points by evaluating the Hessian matrix at those points.

If it is positive definite, it is a local minimum. If it is negative definite it is a local maximum. If there are both positive and negative eigenvalues it is a saddle point.