Please tell me how to correctly implement the data parser. Description: I have a long list, all data from this list is deleted once a month, so a rather long one is obtained. Previously, I didn’t really work with perssers, I’m interested in this logic. At the moment: (in each line of the list is indicated id time count number)

Открываем файл. Создаю подключения с БД. цикл{ выбираю строку проверяю не пуста ли она запрашиваю из таблицы данные(обычный select сравниваю по id и времени) если запись в базе присутствует, переходим к следующей строке если нет, то заносим в базу данных данные из строки. } закрыли подключение. закрыли файл. 

The resulting heap of mysql queries scares me ... What do you think about this?

  • I would create a second table with the necessary indexes (a table with what should be inserted). And on the DB side, I made the Update of the first table by taking the data from the second and checking if there is already an entry in the first table with the current ID. And your example is not called a large volume. - Fantyk
  • My example is only information for 6 days and there are several such lists, about 5 pieces. - avengerweb
  • Is there no update or insert command in mysql? - pincher1519
  • @ pincher1519, there is. INSERT is called ... ON DUPLICATE KEY UPDATE - Denis Khvorostin

1 answer 1

Look towards MERGE in SQL and its counterpart in MySQL ( INSERT ... ON DUPLICATE KEY UPDATE ). At a minimum, this will already reduce your queries to the database: the check will be performed on the database side.

The second option is to write all the data in a row to an auxiliary (temporary) table, without thinking about duplicates. Duplicate catch later on the side of the database.

The third option is to process the file, prepare an array, filtering duplicates, then write to the database.

It all depends on the specific conditions: the frequency of the script, the file size.

UPD. LOAD DATA INFILE - to quickly fill data into the database from the text (you can do without your parser in your case - the boundaries between the fields and records are rigidly specified). Then the algorithm is:

  1. Fill in the auxiliary table
  2. Using INSERT ... SELECT, write new data to the main table (select all records whose ID is greater than the maximum in the main table)
  3. Clear auxiliary table
  • Once an hour. mmotop.ru/votes/3a8fdd1fe483004369f0638a0e909dbe0f38e53a.txt sample file. The file is reset once a month in the first days, so the volume is quite large. - avengerweb
  • Id seems to be unique there - read the last id from the database and then write to the database all the lines that have Id more. - Denis Khvorostin pm
  • id scattered. - avengerweb 5:09
  • The intervals between IDs are not important. It is important that you do not have situations when the subsequent ID is less than the previous one - in the example, at least, this is the case. In other words, you can definitely find a fragment from which data should be taken. - Denis Khvorostin pm