Faced a task, there are two variables

$start = 015223; //начальное значение 

and

 $finish = 628311; //конечное значение 

digits are always six-digit and can be arbitrary, but start always less than finish There are two columns in the table id and number

  • id id with autoincrement attribute
  • number store six-digit numbers

The task is to add a numeric gap between start and finish to the database, but so that if the database already has a string with a matching number, it was not recorded repeatedly.

Of course, I can set up a bulky construction from a for loop in which SELECT will be first sampled and checked for the number of selected columns, and if num_rows returns 0, then INSERT will be done, but as you know, it will be just super resource-intensive and cumbersome - there may be some simplifying function it seems as DISTINCT at sampling what all to simplify ??

  • What is dialect? generate_series in postgresql and the ha_sequence in mariadb 10+ will help you make the desired one with a not very complicated query. - Fine
  • @ Small yes please the main thing that would be better than my notion with for - dantelol
  • So what is the dialect? I do not smile just like that to paint all possible dialects. - Fine
  • @ Small oh I am weak in this I don’t even know how to define a dialect, in general I have Denver on it mysql 5.5 is configured phpmyadmin I make a website - can this be used to determine a dialect ?? - dantelol
  • Yes, the mysql dialect. Dumb as a cork, nothing useful can not. Now I will think and write something with the answer, how can I make the task in a less painful way ... - Min

3 answers 3

As found in the comments, we are talking about mysql 5.5 . No useful generate_series or equivalents are able, so it is necessary to pervert.

The options are not very correct

Six-digit numbers, which means a maximum of 1 million records - this is actually not too dofiga. You can subtract everything from the table by where number between :start and :end into a simple array number => true , it will take some megabytes of memory in the worst case. Then in for with the help of isset to check whether such a number is necessary in the list, if not - then accumulate in a separate array and each, that way, 1000 elements to be dumped in one query to the base. Not too correct thought, but possible.

You can do exactly the same, but not read everything, but by breaking the initial range into several blocks, for example, 50 thousand numbers each - for a full crawl, 20 requests to the database for reading will be required.

Or all the same way, but another option that allows you to control and memory consumption in the worst case and the minimum number of requests at best: read into the array where number >= :start order by number limit 10000 . If you returned less than the limit records - you act according to the first scenario, you have read all the numbers used, all the rest from start to end you need to add to the database. If exactly the number of records that is specified in limit is returned, then perhaps you have more data in the table. Then you memorize the last number from the query result, and when your for reached this number, you also make a query to the database by substituting your last number where number >= :last_number order by number limit 10000 and re-create your array with the existing id in the database. This will work well if usually there are few records and gaps in your spreadsheet often.

Again options

Stranger, but more productive. If you need to perform such a task more or less often, then create a table, for example:

 create table int_series (number int(11) unsigned not null primary key); 

In any possible way, write down the numbers from 1 to 999999 in it successively. And don't touch it anymore, it is not made to touch it, except for how to read it. Now your whole task is to call the request:

 insert into `tablename` (number) select number from int_series left join tablename using(number) where tablename.number is null and number between :start and :end 

It's all. Seriously.

For Postgresql, which was mentioned in the comments, this extra table is simply not needed, you can immediately do:

 insert into `tablename` (number) select number from generate_series(:start, :end) number left join tablename using(number) where tablename.number is null 

Of course, in all variants on the tablename.number should be built index.

A variant with sequence generation on the fly by the power of a huge join - unfortunately, it works adequately only on small ranges. The range indicated in the question was calculated in half a second - this is only a range.

If there is a large table next to it (absolutely any, as long as it contains enough rows), then you can generate sequences using a user variable:

 SELECT @i := @i + 1 AS number FROM any_big_table, (select @i:=$start) AS z limit $end - $start 

It will run more fun, but need some sort of sign. For a finite range, rather small besides, only 1 million rows - it is easier to create a table with numbers in advance.

Behind the scenes

Behind the scenes was, it seems, only an option with a stored procedure. You can write to the store, which will do exactly the naive for loop, but because of the proximity to the data, it is more efficient to do this than from the application. But one specialized query will be more efficient, and the stored logic in mysql is not the most trouble-free thing. Quite unpleasant, for example, is the silent automatic commit of a transaction when calling a stored object.

  • И больше её не трогайте, она не для того сделана, чтобы её как-то трогать, кроме как читать. ....... and what, it is impossible to create a temporary table of the MEMORY type, into which the values ​​can be heard through the Cartesian union of the tables on themselves ... and then they are injected into the destination and slam the temporary table? isn't it better? and you don’t have to keep this incomprehensible table and MEMORY works at the speed of light - Alexey Shimansky
  • The question is the necessary frequency. A temporary table will help if you need to do several operations within the same session. If there is only one such operation in the session, then the Cartesian product directly will be even faster. Persistent memory - it will have to be filled at startup and will still hang in the list of tables. And the usual persistent table with the Internet logs will take only a couple of megabytes of space. It loads quickly with elementary seqscan, then hangs in the buffer until it is crowded out. - Fine
  • @ Small "Write numbers from 1 to 999999 in any way possible." and if from 000001 to 999999 it is possible? - dantelol

In the worst case, you have 1,000,000 lines in a sequence. You need to generate such a number of lines.

We will generate them wisely. First, we calculate how many lines we really need. Then we will multiply the labels of 10 lines, until we get the right amount.

It looks like this:

 SELECT * FROM( SELECT start, finish, cnt, start+ COALESCE(a, 0)+ COALESCE(b, 0)*10+ COALESCE(c, 0)*100+ COALESCE(d, 0)*1000+ COALESCE(e, 0)*10000+ COALESCE(f, 0)*100000 N FROM (SELECT $start start, $finish finish, $finish-$start cnt)T LEFT JOIN( SELECT 0 a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 )A ON cnt >= 0 LEFT JOIN( SELECT 0 b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 )B ON cnt >= 10 LEFT JOIN( SELECT 0 c UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 )C ON cnt >= 100 LEFT JOIN( SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 )D ON cnt >= 1000 LEFT JOIN( SELECT 0 e UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 )E ON cnt >= 10000 LEFT JOIN( SELECT 0 f UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 )F ON cnt >= 100000 )T WHERE N <= finish 

Pay attention to the conditions of the form cnt> = 1000. It will work (the next multiplication occurs with the sign 0..9) only if our number is greater than or equal to 1000. Ie if we need, for example 5000 lines. This query will generate 10,000 and filter the extra 5,000, not a million! This is a very important point, which greatly improves performance.

Thus, we obtained a sequence with the required number of elements from $ start to $ finish. However, this will work quickly for small ranges. Those. From the optimization point of view, everything is done according to the mind.

Next, you need to insert the missing values ​​into the table. It will look like this for example:

 INSERT UserTable SELECT number FROM( SELECT start, finish, cnt, start+ COALESCE(a, 0)+ COALESCE(b, 0)*10+ COALESCE(c, 0)*100+ COALESCE(d, 0)*1000+ COALESCE(e, 0)*10000+ COALESCE(f, 0)*100000 N /*часть запроса я убрал для краткости*/ )F ON cnt >= 100000 )T WHERE N <= finish AND NOT EXISTS ( SELECT number FROM UserTable ) 

How to insert missing values ​​is a matter of taste. In the EXISTS example, it can be NOT IN or LEFT JOIN. I told you how to quickly generate a sequence of obviously unknown size in one query in MySQL.

    INSERT IGNORE

    Often, when adding a new row to a table with a UNIQUE index or PRIMARY KEY, the INSERT IGNORE syntax is very useful. The use of this syntax is convenient in case of accidental duplication of the key when inserting, that is, the insertion itself will not be performed, and execution will not be terminated. The usual algorithm is:

    1. check the presence of a row in the table by key (SELECT)
    2. insert string in case of no duplication of key (INSERT)

    find the object

     $row = query('SELECT * FROM table WHERE id=1'); // если такого объекта нет, то вставляем новую запись if (!$row) { query('INSERT INTO table …'); } 

    Now let's write only one INSERT IGNORE request without php participation.

     query('INSERT IGNORE INTO table …') // вставка 

    INSERT statement syntax
    http://phpclub.ru/mysql/doc/insert.html