Rather than maximise \(f(x)\), we want to maximise \(f(x)\) subject to \(g(x)=0\).
We write this, the Lagrangian, as:
\(\mathcal{L}(x,\lambda )=f(x)-\sum^m_k\lambda_k [g_k(x)-c_k]\)
We examine the stationary points for both vector \(x\) and \(\lambda\). By including the latter we ensure that these points are consistent with the constraints.
Our function is:
\(\mathcal{L}(x, \lambda )=f(x)-\lambda [g(x)-c]\)
The first-order conditions are:
\(\mathcal{L}_{\lambda }= -[g(x)-c]\)
\(\mathcal{L}_{x_i}=\dfrac{\delta f}{\delta x_i}-\lambda \dfrac{\delta g}{\delta x_i}\)
The solution is stationary so:
\(\mathcal{L}_{x_i}=\dfrac{\delta f}{\delta x_i}-\lambda \dfrac{\delta g}{\delta x_i}=0\)
\(\lambda \dfrac{\delta g}{\delta x_i}=\dfrac{\delta f}{\delta x_i}\)
\(\lambda =\dfrac{\dfrac{\delta f}{\delta x_i}}{\dfrac{\delta g}{\delta x_i}}\)
Finally, we can use the following in practical applications.
\(\dfrac{\dfrac{\delta f}{\delta x_i}}{\dfrac{\delta g}{\delta x_i}}=\dfrac{\dfrac{\delta f}{\delta x_j}}{\dfrac{\delta g}{\delta x_j}}\)
This time we have:
\(\mathcal{L}_{x_i}=\dfrac{\delta f}{\delta x_i}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_i}=0\)
\(\mathcal{L}_{x_j}=\dfrac{\delta f}{\delta x_j}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_j}=0\)
\(\dfrac{\delta f}{\delta x_i}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_i}=\dfrac{\delta f}{\delta x_j}-\sum^m_k\lambda_k \dfrac{\delta g_k}{\delta x_j}\)
linear programming means of the form max \(c^Tx\) st. \(Ax<=b\) \(x>=0\) this is the canonical form
We can add constraints to an optimisation problem. These constraints can be equality constraints or inequality constraints. We can write constrained optimisation problem as:
Minimise \(f(x)\) subject to
\(g_i(x)\le 0\) for \(i=1,...,m\)
\(h_i(x)=0\) for \(i=1,…,p\)
We write the Lagrangian as:
\(\mathcal{L}(x, \lambda, \nu )=f(x)+\sum_{i=1}^m\lambda_i g_i(x)+\sum_{i=1}^p\nu_ih_i(x)\)
If we try and solve this like a standard Lagrangian, then all of the inequality constraints will instead by equality constraints.
The Lagrangian function is affine with respect to \(\lambda\) and \(\nu\).
\(\mathcal{L}(x, \lambda, \nu )=f(x)+\sum_{i=1}^m\lambda_i g_i(x)+\sum_{i=1}^p\nu_ih_i(x)\)
\(\mathcal{L}_{\lambda_i}(x, \lambda, \nu )=g_i(x)\)
\(\mathcal{L}_{\nu_i}(x, \lambda, \nu )=h_i(x)\)
As the partial differential is constant, the partial differential is an affine function.
We already have this.
We can define the Lagrangian dual function:
\(g(\lambda, \nu ) = \inf_{x\in X} \mathcal{L}(x, \lambda ,\nu )\)
That is, we have a function which chooses the returns the value of the optimised Lagrangian, given the values of \(\lambda\) and \(\nu\).
This is an unconstrained function.
We can prove this function is concave (how?).
The infimum of a set of concave (and therefore also affine) functions is concave.
The supremum of a set of convex (and therefore also affine) functions is convex.
Given a function with inputs \(x\), what values of \(x\) maximise the function?
We explore constrained and unconstrained optimisation. The former is where restrictions are placed on vector \(x\), such as a budget constraint in economics.
We refer to the optimal solution for the primary problem as \(p^*\), and the optimal solution for the dual problem as \(d^*\).
The duality gap is \(p^*-d^*\).
We have matrix \(A\) and vector \(b\).
Either:
\(Ax=b\); \(x\ge 0\)
\(A^Ty\ge 0\); \(b^Ty<0\)
The duality gap (\(p^*-d^*\) is non-negative.
Strong duality is where the duality gap is \(0\).
Slater’s condition says that strong duality holds if there is an input where the inequality constraints are satisified strictly.
That is they are \(g(x)<0\), not \(g(x)\le 0\)
This means that the conditions are slack.
This only applies if the problem is convex. That is, if Slater’s condition holds, and the problem is convex, then strong duality holds.
If our problem is non-convex, or if Slater’s condition does not hold, how else can be find a solution?
A solution, \(p^*\) can satisify KKT conditions.
Consider a function which takes two parameters:
\(f(x,\alpha\)
We want to choose \(x\) to maximise \(f\), given \(\alpha\).
\(V(\alpha )=\sup_{x\in X}f(x,\alpha )\)
There is a subset of \(X\) where \(f(x,\alpha )=V(\alpha )\).
\(X^*(\alpha )=\{x\in X|f(x, \alpha )=V(\alpha )\}\)
This means that \(V(\alpha )=f(x^*,\alpha )\) for \(x^*\in X^*\).
Let’s assume that there is only one \(x^*\).
\(V(\alpha )=f(x^*,\alpha )\)
What happens to the value function as we relax \(\alpha\)?
\(V_{\alpha_i}(\alpha )=f_{\alpha_i}(x^*(\alpha ),\alpha )\).
\(V_{\alpha_i}(\alpha )=f_x\dfrac{\delta x^*}{\delta \alpha }+f_{\alpha_i}\).
We know that \(f_x=0\) from first order conditions. So:
\(V_{\alpha_i}(\alpha )= f_{\alpha_i}\).
That is, at the optimum, as the constant is relaxed, we can treat the \(x^*\) as fixed, as the first-order movement is \(0\).
In min/max problem, any feasibly solution is an upper/lower bound.
can we get a bound at the other side? yes, by doing linear combinations of inequalities eg maximise \(30x + 100y\) subject to: \(4x + 10y <= 40\) \(x >=3\)
We can identify a lower bound by inputting something which works, for example \(x=3\) and \(y=0\). This gives us a lower bound of \(90\).
To get an upper bound we can manipulate the constraints: \(40x + 100y<=400\) \(10x>=30\) And then: \(40x+100y<=370+30\) \(40x+100y<=370+10x\) \(30x+100y<=370\)
So we have an upper bound of \(370\).
This lower bound is a result of doing linear combinations of the inequalities. For different combinations, we could have a lower lower bound.
This is the dual problem. How do we choose the linear combination of inequalities such that the resulting lower bound is minimised?
We can take a function and make a matrix of its second order partial derivatives. This is the Hessian matrix, and it describes the local curvature of the function.
If the function \(f\) has \(n\) parameters, the Hessian matrix is \(n\times n\), and is defined as:
\(H_{ij}=\dfrac{\delta^2 f}{\delta x_i \delta x_j}\)
If the function is convex, then the Hessian matrix is positive semi-definite for all points, and vice versa.
If the function is concave, then the Hessian matrix is negative semi-definite for all points, and vice versa.
We can diagnose critical points by evaluating the Hessian matrix at those points.
If it is positive definite, it is a local minimum. If it is negative definite it is a local maximum. If there are both positive and negative eigenvalues it is a saddle point.