Selection of MySQL rows with arbitrary thinning. Variable problems

Question

In the table there are columns: Name_Ch char (15), Event_Time TIMESTAMP (3), Param_Value Float. Physical meaning: there are a number of devices, each with its own Name_Ch, each writes to the table a number of lines containing the value of the Param_Value parameter and the time at which this value was received, about 20 records / sec. Splitting by tables is not allowed. Purpose: sampling from the table a single Name_Ch with an arbitrary time thinning interval, i.e. we take not every fifth (for example) value, but determine the interval in advance (10 seconds, for example, up to several hours) and select lines with an interval of 10 seconds. Written procedure

CREATE DEFINER=`root`@`localhost` PROCEDURE `Read_TIME_and_RMS`(IN ParamName1 text(15), IN ParamName2 text(15), IN DSTN VARBINARY(15), IN TSbegin DATETIME(3), IN TSend DATETIME(3), IN dtsec float, IN STR_to_read INT(11)) BEGIN DECLARE TIMEbuf DATETIME(3); DECLARE startpoint DATETIME(3); truncate table temp_table_memory; SET startpoint:=TSbegin+INTERVAL dtsec SECOND; L: WHILE (STR_to_read>0) DO insert into temp_table_memory (TIME, RMS) SELECT EventTime, RMS FROM backend_test.eventdata WHERE (CHName=DSTN) and (EventTime>@startpoint) LIMIT 1; SET @t = FOUND_ROWS(); IF (@t=1) THEN SET @STR_to_read:=@STR_to_read-1; SELECT MAX(TIME) into @timebuf FROM temp_table_memory; /* SELECT TIME into TIMEbuf FROM temp_table_memory order by `TIME` DESC limit 1;*/ SET startpoint:=timebuf+INTERVAL dtsec SECOND; ELSE LEAVE L; END IF; END WHILE L; select * from temp_table_memory; /*truncate table temp_table_memory;*/ END

Algorithm:

The procedure gets the main parameters - DSTN name, time interval for TSbegin TSend search, dtsec time increment, number of Str_to_read lines
From the beginning of the interval, we count dtsec and perform a search for a single line that satisfies the condition
The found timestamp-value pair is inserted into the temporary table
From this temporary table, we take the value with the maximum timestamp (the option to read into variables, and then insert variables into the temporary table was too slow), add dtsec to it and perform the following search.
We fall out of the loop when we have accumulated the required number of Str_to_read lines or if the next sample is empty.
Making a SELECT * from a temporary table is the desired result.
Clear the temporary table

Program structure:
We open to a DB. We start the procedure in a loop (data is constantly added to the database, we read new ones). Exit the loop and close the connection. There are problems with variables: somewhere, something is not cleared, or is used incorrectly - the procedure returns repeated values that do not correspond to the desired interval.

If in each iteration of the main loop to open a connection, run the procedure, and then close the connection, then everything is read correctly. Where to look for an error?

Perhaps there are better options for implementing this task. Either measures to speed up the work. The main program is written in LabView, a means of communicating with the DB-LV SQL Connector, it works natively via TCP.

UPD: Explored further. 90% of the time is the Sending data phase. Everything is local. About the correctness of the data: the current test table contains rows in ascending order of the timestamp and nothing else. It feels good. As soon as such an order changes, it is necessary to take the MAX from the previous result as the TSbegin for the next sample. And the problem described at the beginning has remained. It seems there is nothing to mess up. The first query gives the correct result, based on it we update TSbegin. In the next iteration, the new query returns 0 (the text is correct, the data is there, verified directly from the mysql workbench), repeated three times. The fourth finally returns the correct result, updating TSbegin. The next iteration and the sample arrives from the previous result (the sample does not match the tsbegin- tsend in the procedure). Mess.

SET @t = FOUND_ROWS(); IF (@t=1) THEN can write immediately IF FOUND_ROWS()=1

Answer 1 · 2016-04-14T18:38:34

You use variables with dogs inside the procedure. They are designed for something completely different and retain their value while the connection to the database is established. So they must be initialized. Pay particular attention - the variables StrToRead (a parameter of your procedure) and @StrToRead are completely different variables that have nothing to do with each other!

I do not see any need to do the procedure and a temporary table to solve your problem. I suggest to make one request:

 select EventTime,RMS from ( select EventTime,RMS, @yes:=if(EventTime>@lt,1,0) YES, @lt:=if(@yes,EventTime+interval 10 second,@lt) from eventdata,(select @lt:='1970-01-01',@yes:=0) A where CHName='DSTN' and EventTime between '2016-04-14 21:00'+interval 10 second and '2016-04-14 21:00'+interval 10*(20+1) second order by EventTime ) A where YES=1 limit 20

Instead of the date of 2016, 10 and 20 substitute variables with the necessary parameters - the starting date, the thinning interval, the number of records, respectively. The upper limit on the date will have to be removed, if you have data in the time-delayed intervals. I included it for optimization in speed, but in case there are gaps in the data, it can capture an insufficient number of rows.

UPD : Once the approach with the procedure on your data is still faster, I suggest such an improvement (variables with dogs are useful for such cases):

 SET @startpoint:=TSbegin; REPEAT insert into temp_table_memory (TIME, RMS) SELECT @startpoint:=EventTime, RMS FROM backend_test.eventdata WHERE (CHName=DSTN) and (EventTime>@startpoint+INTERVAL dtsec SECOND) LIMIT 1; SET STR_to_read:=STR_to_read-1; UNTIL STR_to_read<=0 or FOUND_ROWS()<=0 END REPEAT;

The trick is that the variable with @ takes the value during insert and does not need to do select max ().

I came to the procedure to speed up the query execution time.
A typical usage scenario: data is requested (about 1k lines) for an interval of about a day (which is more than a million lines from one channel) with a large decimation (tens of minutes).
In the overwhelming majority of implementations, the execution of such a request takes more than 10 seconds, which is unacceptable.
In the case of the procedure with the same input data, a couple of seconds are spent on everything, which is much better.
Your sample code runs in 13 seconds, regardless of the interval and number of entries.
Yes, while debugging in the main table I created four indexes, two single-column (CHname, EventTime) and two composite (EventTime-ChName and vice versa).
Maximum performance with a composite index CHname-EventTime, which is expected.
As an experiment, I tried to set the MEMORY engine for a temporary table, got a performance degradation by a factor of two (compared to myisam), which is unexpected.
For some reason, a compound INSERT-SELECT takes a very long time, and the longer the sample interval, the longer the link is executed.
The same INSERT-SELECT is performed very quickly, being referred to as a reference .
The last code modification returns one last (very similar) found string
@ Vladislav order by EventTime add all the same to the request, I don’t know how it will affect the performance, but it’s worth checking it out

Selection of MySQL rows with arbitrary thinning. Variable problems

1 answer 1

More articles: