Suppose there is a table with some processes that can receive various statuses during execution:

process process_status ╔════╦════════╗ ╔════╦════════════╦════════════╦═════════╗ β•‘ id β•‘ name β•‘ β•‘ id β•‘ process_id β•‘ date β•‘ status β•‘ ╠════╬════════╣ ╠════╬════════════╬════════════╬═════════╣ β•‘ 1 β•‘ One β•‘ β•‘ 1 β•‘ 1 β•‘ 2018-10-10 β•‘ Start β•‘ β•‘ 2 β•‘ Two β•‘ β•‘ 2 β•‘ 1 β•‘ 2018-10-15 β•‘ Running β•‘ β•‘ 3 β•‘ Three β•‘ β•‘ 3 β•‘ 1 β•‘ 2018-10-30 β•‘ Running β•‘ β•šβ•β•β•β•β•©β•β•β•β•β•β•β•β•β• β•‘ 4 β•‘ 1 β•‘ 2018-11-01 β•‘ Running β•‘ β•‘ 5 β•‘ 1 β•‘ 2018-11-02 β•‘ Error β•‘ β•‘ 6 β•‘ 1 β•‘ 2018-11-03 β•‘ Error β•‘ β•šβ•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β• 

As you can see, statuses can be different, as well as each status can be obtained several times.

I am only interested in the Running and Error statuses. Now I get the entries like this:

 select p.id, p.name from process p join process_status ps on p.id = ps.process_id where ps.status = 'Running' or ps.status = 'Error'; 

As a result, 5 rows are returned. Three with the status of Running and two with the status of Error .

Is it possible to somehow return only 2 rows? Is the first received Running (If there is one at all) and / or the very first Error received (If there is one too)?

Ideally, return all processes for which the very first Running received no later than a certain date, for example, 2018-10-16 OR the very first Error received no later than some other date, for example, 2018-10-17 ?

Is it possible?

As a result, ideally, I need only processes that satisfy this condition:

 ╔══════╦════════╗ β•‘ p.id β•‘ p.name β•‘ ╠══════╬════════╣ β•‘ 1 β•‘ One β•‘ β•šβ•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β• 

UPD
I will explain my task a little:

Now all the processes and all their Running and Error statuses are recalled from the database (approximately by the query I gave), and already from the code side, those processes that received the first Running or Error status no later than a certain date are searched.

The fact. that about 8-10 thousand (sometimes more) processes are launched daily, their number is already in the tens of millions. Each process also receives about 10 thousand statuses.

Right now there are no performance issues, but I am performing a task that affects this code. In my opinion, too much unnecessary data travels from the database to the application. Therefore, since I still refactor this section of code, I want to make it more optimal, namely, not to subtract millions of processes and select several dozens of necessary ones in the code, and return only those 10 necessary ones at once, and work only with them.

  • Is it possible to somehow return only 2 rows? Yes. Ideally, return all processes. But first, think about it - are all fields really necessary? Probably not - and this will greatly simplify the request. And if you still need them, use FIRST_VALUE (). Or LATERAL JOIN to the subquery that will receive the dates of such records. - Akina
  • @Akina In principle, only all fields of the first table are needed - Oleg
  • Well, show the desired result on the model data shown, and without surplus ... - Akina
  • @Akina Not really understood. In the last paragraph, the desired result is described. - Oleg
  • No no no. In the same way as the source, show the result. Fields and values. And by all means, formulate ONE condition (even if it is a complex one), to which such a result corresponds, without any kind of fatty OR. - Akina

1 answer 1

return all processes for which the very first Running is received no later than a certain date, for example, 2018-10-16 OR the very first Error is received no later than some other date, for example, 2018-10-16

 SELECT DISTINCT t1.* FROM process t1, process_status t2 WHERE t1.id = t2.process_id AND t2.date < '2018-10-16' AND t2.status IN ('Running', 'Error') 

fiddle

If β€œno later than some other date” is really ANOTHER, and in the text is the same date by mistake, then

 WHERE t1.id = t2.process_id AND ( (t2.date < '2018-10-15' AND t2.status = 'Running' ) OR (t2.date < '2018-10-16' AND t2.status = 'Error') ) 
  • Thank. This is what I need. The question is really a mistake. I have a telephone meeting, in parallel I am writing code, in parallel here, therefore I make mistakes from inattention. - Oleg