An introduction to robust optimization [... and a small shopping list that I forgot ...]
How to determine how many people need to hire a new fulfillment, what exactly is it to fill in and where to put a specific product? The larger the business becomes, the higher the uncertainty and the more expensive the error. Defeating chaos and choosing the optimal solution is one of the tasks of the data science team. And since the basis of data analysis is mathematics, let's begin with it.
In this post, we will consider optimization problems with uncertainty in the data and their approximation by deterministic convex problems. This is one of the main tricks in robust optimization - a technique that allows you to cope with optimization tasks that are too sensitive to changes in input data.
The issue of sensitivity is very important. For problems whose quality of solution weakly depends on changes in the data, it is easier to use the usual stochastic optimization. However, in tasks with high sensitivity, this approach will give a bad result. There are many such tasks in finance, supply management, design and many other areas.
And yes, this is an example of a post where complexity grows exponentially (soryan already) ...
What does “solve” an optimization problem mean?
Let's start with a brief reminder.
The optimization task in general form looks like this:
Here called the objective function, and - admissible set.
Under the solution of the optimization problem imply such a point Completed for:
This is a standard concept for solving an optimization problem without uncertainty.
What is an optimization problem with uncertainty?
It's time to wonder about the origin of the function. and limitations .
Very useful to share
structural logic of the problem (in other words, what functions are used),
technical limitations (independent of human or data logic)
parameters that are estimated from the data.
For example, a person from business came to us and showed a linear programming task:
You see this task for the first time. Man, too (maybe not, but in blue jackets all so abstract!). The meaning of the variables you do not know. But even now it is possible to say with a high degree of confidence that:
Most likely the problem is linear, because someone decided so. Linearity is the structure chosen by man.
Restrictions are technical. That is, they came from "physics", and not from the data (for example, sales cannot be negative).
Specific Values \ {0.973, 2.619, 3.32 \} in limitation in our example were estimated from the data. That is, first someone said that the variable linked to variable , then it was said that the relationship is linear, and, finally, the coefficients in the equation of communication were estimated from the data. The same is true about coefficients. \ {2.16, 3.7 \} in the objective function.
When we talk about problems with uncertainty, we target precisely the uncertainty in the parameters estimated by the data. We do not touch the technical limitations or the initial choice of the structure of the task.
Let's go back to our story. We have a linear problem, the coefficients in it, someone somehow appreciated. If we were right about the nature of the coefficients in the function, then in fact we were asked to solve the problem for one scenario of events (a specific instance of the problem).
Sometimes this is enough for us, and we just solve it.
However, sometimes solving a puzzle for one scenario is a stupid idea (for example, if the solution is very sensitive to data variation).
What to do in this case, and how to model uncertainty in the data?
First, we note that the uncertainty in the data can always be transferred from the objective function to the constraints or vice versa. How to do this, look under the cut.
Transfer of uncertainty from the objective function to the constraints or vice versa
It is often more convenient to transfer all uncertainty into one part of the problem: the objective function or constraints.
Transfer of uncertainty from the target functionality to constraints
For any optimization task
You can build an equivalent without uncertainty in the target functional:
Decision the equivalent problem contains the solution of the original .
Transfer of uncertainty from constraints to target functionality
Formally, for any optimization problem with constraints
you can build an equivalent task without restrictions
using indicator function
It is clear that no algorithm will digest such a function, but this is not necessary. The next logical step is to approximate the indicator function with something digestible. What exactly - depends on the situation (about this a little later). This is how, for example, inner point methods (a special case of penalty function methods ) and many others are constructed.
Stochastic, online, robust optimization and product list
We may have many scenarios of uncertainty, as well as options for what to do with it. We illustrate several standard approaches with a simple example.
I do not know how the situation is with a respected reader, but here I am married (successfully) and periodically go to the grocery store. With a leaflet, of course (gives invulnerability from impulse purchases). Sometimes not just to the store, but to the conditional Auchan, where it is cheaper, but where to go far.
We will simulate this situation: we came to Auchan with a leaflet in our hands for shopping.
Attention, the first question: how to model?
Input data: information about the products to purchase and the required quantity.
For convenience, we can think of a piece of paper as some kind of integer non-negative vector .
As variables, take, respectively, an integer non-negative vector. - how many and what products we will eventually buy (our solution).
Things are easy - take some objective function which says how badly we made a mistake with the choice of products.
Depending on the context, the type of function may vary, but there are some basic requirements for it:
Function should have a minimum (that is, in the optimum we will buy exactly what is written in the leaflet)
Function should be convex on (and preferably smooth) - to be able to effectively calculate .
Thus we get the task:
Now imagine that the leaflet stayed at home ...
So, with one remark we got into the world of tasks with uncertainty.
So, what to do if in the problem unknown to us ?
The answer, again, depends on the context.
Stochastic optimization
Stochastic optimization (usually) involves
Uncertainty in the data has exactly the stochastic nature. Complete knowledge of the probability distribution of non-deterministic parameter values
Limitations including uncertainty are soft
In our example, if we modeled it using stochastic optimization, we would say
Okay, I don’t know what was written in the leaflet, but I’m been walking with leaflets for 8 years already, and I have quite a good knowledge of the vector distribution
Even if I make a wrong choice (that is, with ) returning home, I find out the real and, if I’m completely locked in, I’ll go down to Pyaterochka and buy up there, albeit more expensive.
Now I choose this which will minimize some aggregate from the initial objective function and possible “penalties” for an error.
This will lead us to this task:
Note that in this task we have de facto decisions are made two times: first, the main purchase decision at Auchan, for which , then "error correction" with .
The main problems of this approach are:
There is often no information on the distribution of parameters.
Restrictions can be tough (for tasks with a high risk - death, ruin, nuclear or zombie appocalypse, etc.)
It is not always possible to “correct mistakes” (the decision is made once) or vice versa, decisions are made often (in this case, many nested integrals will appear, and it will be very difficult to count).
Online optimization
Online optimization is a framework for studying consistent decision making. One of the standard approaches to modeling in this framework is multi-armed gangsters, about which Habré has been repeatedly written.
In the context of our toy example, we would:
never had (and never used before) leaflet
and at home we would be praised / scolded for those products that we bought (and we could only guess about the desired set)
the task would be to learn how to buy products as quickly as possible, like her former, imaginary prince, or the best of the sons of her mother's friend.
Robust optimization
Robust optimization is a logical extension of the idea of a minimax solution.
Ideally, we should make a decision right now that will always remain valid regardless of the circumstances. People who designed pots, irons and refrigerators in the USSR did it in the canvass of robust optimization: the product should work even if it was used for 20 years as the main tool for the extermination of mutants that appeared after a nuclear war (it must also be experienced).
Moreover, I want the task to be crammed into an ordinary solver - and they do not understand the limitations “for any implementation of a random variable” (if these realizations are not a finite number).
In a problem with a leaflet, the decision should be made here and now and remain valid under any circumstances:
It is clear that even in this toy example, if nothing is required from then no meaningful decision will work.
So how to work with such tasks?
Construction of a robust version of the problem on the example of the LP problem
Consider a linear optimization problem with uncertainty:
Options were derived from data and include uncertainty.
Assumption 1: Set of values (implementations) can be parameterized, i.e. there are such that any data implementation lies in the set:
\ begin {pmatrix} c ^ T, d \\ A, b \ end {pmatrix} \ in U = \ left \ {\ begin {pmatrix} c_0 ^ T, d_0 \\ A_0, b_0 \ end {pmatrix} + \ sum_ {i = 1} ^ k \ zeta_i \ begin {pmatrix} c_i ^ T, d_i \\ A_i, b_i \ end {pmatrix} | \ quad \ zeta \ in Q \ subset R ^ k \ right \}
Here are called “nominal” data, and - "shifts."
Mini example
I want to explain a little about their significance on a model example from finance: the problem of choosing the optimal portfolio of securities. Let's say you want to invest. Now on the stock exchange available to you shares, and you need to understand how to distribute your capital (invest) on these securities so as to maximize your income while limiting the risk. One of the first models for solving this problem (the Markowitz model) suggested doing the following:
Collect historical data on the yield of a security: where - this is the price of an asset at the moment of time .
Find empirical averages of paper returns. and empirical yield covariance matrix
Solve the optimization problem
The solution of the problem is the optimal distribution (share) of capital by securities.
In fact, we maximize the expected profitability, or, we are looking for the optimal portfolio for one scenario - the case when the realization of random (!) Profitability coincides with the empirical average.
In the context of parameterization exactly serves as “nominal” data.
We already know that all the uncertainty in the problem can be removed in the constraints. Let's do it.
We get the task
Robust version of the task
Now it’s time for one of the coolest tricks in robust optimization — how to move from an infinite number of constraints to a finite set of good constraints.
To begin, consider a simple example when
Q = \ {\ zeta \ in R ^ k | \ | \ zeta \ | _2 \ leq 1 \}
All restrictions in the system
same type - this is just linear inequality. Learn to work with one - learn to work with everyone.
Therefore, we consider one constraint of the inequality type:
a ^ Tx \ leq b \ quad \ forall (a, b) \ in U = \ {(a_0, b_0) + \ sum_ {i = 1} ^ k \ zeta_i \ cdot (a_i, b_i) | \ quad \ zeta \ in Q \} \\ (a_0 + \ sum_ {i = 1} ^ k \ zeta_i a_i) ^ Tx \ leq b_0 + \ sum_ {i = 1} ^ k \ zeta_i b_i \ quad \ forall \ zeta \ in Q \\ \ sum_ {i = 1} ^ k \ zeta_i \ cdot (a_i ^ T x - b_i) \ leq b_0 - a_0 ^ Tx \ quad \ forall \ zeta \ in Q \\ \ max _ {\ zeta \ in Q} \ sum_ {i = 1} ^ k \ zeta_i \ cdot (a_i ^ T x - b_i) \ leq b_0 - a_0 ^ Tx
I will explain what happened.
At first we transferred all parts with uncertainty to the left side of the inequality, for the uncertainty is responsible . After that we looked at the worst case (for each he is his) As a result, we got the following entry:
.
The next step is to explicitly write out the function. . To do this, it is enough to solve the optimization problem by and substitute the optimal :
which leads to inequality:
Note that the resulting inequality is convex and any it satisfies satisfies the original for any implementation ...
Restriction called the robust version of the restriction .
This is one of the main workhorses in robust optimization — approximation of probabilistic constraints by a finite set of convex constraints.
What to do with more complex (non-linear) constraints?
Construction of robust versions of constraints using conic duality
A lot of standard nonlinear constraints can be represented in a conical form (that is, in the form of where is some closed convex cone):
Non-negativity
Restrictions on norms \ | x \ | _p \ leq p \ quad \ leftrightarrow \ quad \ begin {pmatrix} x \\ p \ end {pmatrix} \ in K_p ^ n = \ left \ {(x, t) \ in R ^ n \ times R_ + | \ quad \ | x \ | _p \ leq t \ right \}
Restrictions on the positive definiteness of the matrix
Let's return to the robust restrictions.
Suppose that the optimization problem by managed to be reduced to conical shape
We construct a dual for this problem.
Some time ago I published a post about conical duality exactly in order to give a little less attention to the technique itself.
Now it's up to the small one - the weak duality theorem:
\ max _ {[\ zeta: \ quad C \ zeta + d \ in K]} \ sum_ {i = 1} ^ k \ zeta_i (a_i ^ Tx-b_i) \ leq \ min _ {\ lambda \ in G} \ lambda ^ Td \\ where \\ G = \ left \ {\ lambda | \ quad C ^ T \ lambda + \ begin {pmatrix} a_1 ^ Tx - b_1 \\ \ dots \\ a_k ^ Tx - b_k \ end {pmatrix} = 0_k; \ quad \ lambda \ in K ^ * \ right \}
Consequently, as a robust approximation of the original constraint can use restriction
\ lambda ^ Td \ leq b_0 - a_0 ^ Tx \\ G = \ left \ {\ lambda | \ quad C ^ T \ lambda + \ begin {pmatrix} a_1 ^ Tx - b_1 \\ \ dots \\ a_k ^ Tx - b_k \ end {pmatrix} = 0_k; \ quad \ lambda \ in K ^ * \ right \}
Where same variable as .
So we built a robust constraint for the original inequality.
Conclusion
We have considered the technique of approximation of bad (stochastic) constraints by a set of good convex ones. This can be useful, for example, if:
You do not want to write algorithms yourself, but the solver used does not know how to work with probabilistic constraints.
There is a problem with stochastic parameters, while the optimum is very sensitive to fluctuations in the data.
And, of course, tasks with uncertainty, where all restrictions are tough (the cost of the error is too high)