📜 ⬆️ ⬇️

An introduction to robust optimization [... and a small shopping list that I forgot ...]

How to determine how many people need to hire a new fulfillment, what exactly is it to fill in and where to put a specific product? The larger the business becomes, the higher the uncertainty and the more expensive the error. Defeating chaos and choosing the optimal solution is one of the tasks of the data science team. And since the basis of data analysis is mathematics, let's begin with it.

In this post, we will consider optimization problems with uncertainty in the data and their approximation by deterministic convex problems. This is one of the main tricks in robust optimization - a technique that allows you to cope with optimization tasks that are too sensitive to changes in input data.

The issue of sensitivity is very important. For problems whose quality of solution weakly depends on changes in the data, it is easier to use the usual stochastic optimization. However, in tasks with high sensitivity, this approach will give a bad result. There are many such tasks in finance, supply management, design and many other areas.

And yes, this is an example of a post where complexity grows exponentially (soryan already) ...

What does “solve” an optimization problem mean?


Let's start with a brief reminder.

The optimization task in general form looks like this:

 minx inRnf(x)s.t.x inX



Here f(x)called the objective function, and X- admissible set.

Under the solution of the optimization problem imply such a point x inxCompleted for:

f(x)f(x) geq0, quad forallx inX


This is a standard concept for solving an optimization problem without uncertainty.

What is an optimization problem with uncertainty?


It's time to wonder about the origin of the function. f(x)and limitations X.

Very useful to share


For example, a person from business came to us and showed a linear programming task:

 minx inR22.16x1+3.7x2s.t.0.973x1+2.619x2 leq3.32x1 geq0,x2 geq0



You see this task for the first time. Man, too (maybe not, but in blue jackets all so abstract!). The meaning of the variables you do not know. But even now it is possible to say with a high degree of confidence that:

  1. Most likely the problem is linear, because someone decided so. Linearity is the structure chosen by man.
  2. Restrictions x1 geq0,x2 geq0are technical. That is, they came from "physics", and not from the data (for example, sales cannot be negative).
  3. Specific Values \ {0.973, 2.619, 3.32 \} in limitation 0.973x1+2.619x2 leq3.32in our example were estimated from the data. That is, first someone said that the variable x1linked to variable x2, then it was said that the relationship is linear, and, finally, the coefficients in the equation of communication were estimated from the data. The same is true about coefficients. \ {2.16, 3.7 \} in the objective function.

When we talk about problems with uncertainty, we target precisely the uncertainty in the parameters estimated by the data. We do not touch the technical limitations or the initial choice of the structure of the task.

Let's go back to our story. We have a linear problem, the coefficients in it, someone somehow appreciated. If we were right about the nature of the coefficients in the function, then in fact we were asked to solve the problem for one scenario of events (a specific instance of the problem).

Sometimes this is enough for us, and we just solve it.

However, sometimes solving a puzzle for one scenario is a stupid idea (for example, if the solution is very sensitive to data variation).

What to do in this case, and how to model uncertainty in the data?

First, we note that the uncertainty in the data can always be transferred from the objective function to the constraints or vice versa. How to do this, look under the cut.

Transfer of uncertainty from the objective function to the constraints or vice versa
It is often more convenient to transfer all uncertainty into one part of the problem: the objective function or constraints.

Transfer of uncertainty from the target functionality to constraints

For any optimization task

 minx inRnf0(x,w)stfi(x, thetai) leq0, quad1 leqi leqk hi(x, betai)=0, quad1 leqi leqm x inX



You can build an equivalent without uncertainty in the target functional:

 minx inRn,t inRtstf0(x,w) leqtfi(x, thetai) leq0, quad1 leqi leqkhi(x, betai)=0, quad1 leqi leqm x inX



Decision (x,t)the equivalent problem contains the solution of the original x.

Transfer of uncertainty from constraints to target functionality

Formally, for any optimization problem with constraints

 minx inRnf(x)s.t.x inX



you can build an equivalent task without restrictions

 minx inRnf(x)+IX(x)



using indicator function

IX(x)= begincases0, quadx inX+ infty, quadx notinX endcases



It is clear that no algorithm will digest such a function, but this is not necessary. The next logical step is to approximate the indicator function with something digestible. What exactly - depends on the situation (about this a little later). This is how, for example, inner point methods (a special case of penalty function methods ) and many others are constructed.


Stochastic, online, robust optimization and product list


We may have many scenarios of uncertainty, as well as options for what to do with it. We illustrate several standard approaches with a simple example.

I do not know how the situation is with a respected reader, but here I am married (successfully) and periodically go to the grocery store. With a leaflet, of course (gives invulnerability from impulse purchases). Sometimes not just to the store, but to the conditional Auchan, where it is cheaper, but where to go far.

We will simulate this situation: we came to Auchan with a leaflet in our hands for shopping.

Attention, the first question: how to model?

Input data: information about the products to purchase and the required quantity.

For convenience, we can think of a piece of paper as some kind of integer non-negative vector y inZ+n.

As variables, take, respectively, an integer non-negative vector. x inZ+n- how many and what products we will eventually buy (our solution).

Things are easy - take some objective function f(x,y)which says how badly we made a mistake with the choice of products.

Depending on the context, the type of function may vary, but there are some basic requirements for it:


Thus we get the task:

minx inRnf(x,y)



Now imagine that the leaflet stayed at home ...

So, with one remark we got into the world of tasks with uncertainty.

So, what to do if in the problem minx inRnf(x,y)unknown to us y?

The answer, again, depends on the context.

Stochastic optimization

Stochastic optimization (usually) involves


In our example, if we modeled it using stochastic optimization, we would say


This will lead us to this task:

 minx inRnEy[f(x,y)+ psi(y,z)]s.t.x+z geqy



Note that in this task we have de facto decisions are made two times: first, the main purchase decision at Auchan, for which x, then "error correction" with z.

The main problems of this approach are:


Online optimization

Online optimization is a framework for studying consistent decision making. One of the standard approaches to modeling in this framework is multi-armed gangsters, about which Habré has been repeatedly written.

In the context of our toy example, we would:


Robust optimization

Robust optimization is a logical extension of the idea of ​​a minimax solution.

Ideally, we should make a decision right now that will always remain valid regardless of the circumstances. People who designed pots, irons and refrigerators in the USSR did it in the canvass of robust optimization: the product should work even if it was used for 20 years as the main tool for the extermination of mutants that appeared after a nuclear war (it must also be experienced).

Moreover, I want the task to be crammed into an ordinary solver - and they do not understand the limitations “for any implementation of a random variable” (if these realizations are not a finite number).

In a problem with a leaflet, the decision should be made here and now and remain valid under any circumstances:

 minx inRn,t inRts.t.f(x,y) leqt quad forallyx geqy quad forally



It is clear that even in this toy example, if nothing is required from ythen no meaningful decision will work.

So how to work with such tasks?

Construction of a robust version of the problem on the example of the LP problem


Consider a linear optimization problem with uncertainty:

 minx inRncTx+ds.t.Ax leqb



Options  beginpmatrixcT,dA,b endpmatrixwere derived from data and include uncertainty.

Assumption 1: Set of values ​​(implementations)  beginpmatrixcT,dA,b endpmatrixcan be parameterized, i.e. there are such  beginpmatrixc0T,d0A0,b0 endpmatrix, beginpmatrixc1T,d1A1,b1 endpmatrix, dots, beginpmatrixckT,dkAk,bk endpmatrixthat any data implementation  beginpmatrixcT,dA,b endpmatrixlies in the set:

\ begin {pmatrix} c ^ T, d \\ A, b \ end {pmatrix} \ in U = \ left \ {\ begin {pmatrix} c_0 ^ T, d_0 \\ A_0, b_0 \ end {pmatrix} + \ sum_ {i = 1} ^ k \ zeta_i \ begin {pmatrix} c_i ^ T, d_i \\ A_i, b_i \ end {pmatrix} | \ quad \ zeta \ in Q \ subset R ^ k \ right \}



Here  beginpmatrixc0T,d0A0,b0 endpmatrixare called “nominal” data, and  beginpmatrixciT,diAi,bi endpmatrix quad(1 leqi leqk)- "shifts."

Mini example
I want to explain a little about their significance on a model example from finance: the problem of choosing the optimal portfolio of securities. Let's say you want to invest. Now on the stock exchange available to you nshares, and you need to understand how to distribute your capital (invest) on these securities so as to maximize your income while limiting the risk. One of the first models for solving this problem (the Markowitz model) suggested doing the following:

  1. Collect historical data on the yield of a security: rit= fracSitSit1Sit1where Sit- this is the price of an asset iat the moment of time t.
  2. Find empirical averages of paper returns.  hatri= frac1T sumt=1Tritand empirical yield covariance matrix  Sigma= |cov(ri,rj) |i,j
  3. Solve the optimization problem

     maxx inR+nxT hatrst frac12xT Sigmax leq sigma sumi=1nxi leq1

The solution of the problem is the optimal distribution (share) of capital by securities.

In fact, we maximize the expected profitability, or, we are looking for the optimal portfolio for one scenario - the case when the realization of random (!) Profitability coincides with the empirical average.

In the context of parameterization rexactly  hatrserves as “nominal” data.


We already know that all the uncertainty in the problem can be removed in the constraints. Let's do it.

We get the task

 minx inRn,t inRtstcTx+d leqt, quad forall beginpmatrixcT,d endpmatrix inUAx leqb, quad forall beginpmatrixA,b endpmatrix inU



Robust version of the task


Now it’s time for one of the coolest tricks in robust optimization — how to move from an infinite number of constraints to a finite set of good constraints.

To begin, consider a simple example when

Q = \ {\ zeta \ in R ^ k | \ | \ zeta \ | _2 \ leq 1 \}



All restrictions in the system

cTx+d leqt, quad forall beginpmatrixcT,d endpmatrix inUAx leqb, quad forall beginpmatrixA,b endpmatrix inU


same type - this is just linear inequality. Learn to work with one - learn to work with everyone.

Therefore, we consider one constraint of the inequality type:

a ^ Tx \ leq b \ quad \ forall (a, b) \ in U = \ {(a_0, b_0) + \ sum_ {i = 1} ^ k \ zeta_i \ cdot (a_i, b_i) | \ quad \ zeta \ in Q \} \\ (a_0 + \ sum_ {i = 1} ^ k \ zeta_i a_i) ^ Tx \ leq b_0 + \ sum_ {i = 1} ^ k \ zeta_i b_i \ quad \ forall \ zeta \ in Q \\ \ sum_ {i = 1} ^ k \ zeta_i \ cdot (a_i ^ T x - b_i) \ leq b_0 - a_0 ^ Tx \ quad \ forall \ zeta \ in Q \\ \ max _ {\ zeta \ in Q} \ sum_ {i = 1} ^ k \ zeta_i \ cdot (a_i ^ T x - b_i) \ leq b_0 - a_0 ^ Tx



I will explain what happened.

At first we transferred all parts with uncertainty to the left side of the inequality, for the uncertainty is responsible  zeta.
After that we looked at the worst case (for each xhe is his)
As a result, we got the following entry:

g(x)=max zeta inQf(x, zeta) leqb0a0Tx

.

The next step is to explicitly write out the function. g(x). To do this, it is enough to solve the optimization problem by  zetaand substitute the optimal  zeta:

 max | zeta |2 leq1 sumi=1k zetai(aiTxbi)= sqrt sumi=1k(aiTxbi)2


which leads to inequality:

 sqrt sumi=1k(aiTxbi)2+a0Tx leqb0



Note that the resulting inequality is convex and any xit satisfies satisfies the original aTx leqbfor any implementation (a,b) inU...

Restriction  sqrt sumi=1k(aiTxbi)2+a0Tx leqb0called the robust version of the restriction aTx leqb quad forall(a,b) inU.

This is one of the main workhorses in robust optimization — approximation of probabilistic constraints by a finite set of convex constraints.

What to do with more complex (non-linear) constraints?

Construction of robust versions of constraints using conic duality


A lot of standard nonlinear constraints can be represented in a conical form (that is, in the form of Ax+b inKwhere Kis some closed convex cone):


Let's return to the robust restrictions.

Suppose that the optimization problem by  zetamanaged to be reduced to conical shape

 max zeta sumi=1k zetai(aiTxbi)s.tC zeta+d inK



We construct a dual for this problem.

Some time ago I published a post about conical duality exactly in order to give a little less attention to the technique itself.

 min lambda lambdaTdstCT lambda+ beginpmatrixa1Txb1 dotsakTxbk endpmatrix=0k lambda inK



Now it's up to the small one - the weak duality theorem:

\ max _ {[\ zeta: \ quad C \ zeta + d \ in K]} \ sum_ {i = 1} ^ k \ zeta_i (a_i ^ Tx-b_i) \ leq \ min _ {\ lambda \ in G} \ lambda ^ Td \\ where \\ G = \ left \ {\ lambda | \ quad C ^ T \ lambda + \ begin {pmatrix} a_1 ^ Tx - b_1 \\ \ dots \\ a_k ^ Tx - b_k \ end {pmatrix} = 0_k; \ quad \ lambda \ in K ^ * \ right \}



Consequently, as a robust approximation of the original constraint aTx leqb, quad(a,b) inUcan use restriction

\ lambda ^ Td \ leq b_0 - a_0 ^ Tx \\ G = \ left \ {\ lambda | \ quad C ^ T \ lambda + \ begin {pmatrix} a_1 ^ Tx - b_1 \\ \ dots \\ a_k ^ Tx - b_k \ end {pmatrix} = 0_k; \ quad \ lambda \ in K ^ * \ right \}


Where  lambdasame variable as x.

So we built a robust constraint for the original inequality.

Conclusion


We have considered the technique of approximation of bad (stochastic) constraints by a set of good convex ones. This can be useful, for example, if:

Source: https://habr.com/ru/post/436342/