Hello! There is a script that should process a large csv file and check if there is an entry in the database. If there is no entry - add, there is - update. It's simple.

Table structure:

id | number | brand | price 1 | 23 | 1 | 22.3 2 | 24 | 2 | 12.3 

The essence of the question: now the script in the loop makes requests to the database to check for existence and further decides to add or update. It turns out a lot of requests and the script is very slow. On duplicate key update - works only with the id field primary field. I have all the fields except id in the loop.

Is there any way to increase performance with such a task or only in the loop do 2 requests? Of course, you can save everything in two arrays and continue to do one insert and update, but the amount of data is about 10M lines ...

UPD (example of a query in a loop)

  if(($part_id = $this->db->query('SELECT id FROM parts WHERE `part_number` = "'.$item['part_number'].'" AND `brand_id` = "'.$item['brand_id'].'" LIMIT 1')->row_array()['id']) != FALSE) $this->db->update('parts', $item, array('id' => $part_id)); else $this->db->insert('parts', $item); 
  • Are there unique keys in the database? - xAqweRx
  • @xAqweRx id - the primary field, number and brand - index. Those. brands can repeat number - ka5itoshka
  • Most likely it will have a cycle. here the question is how exactly do you make requests. If you do it once before the prepare loop and in the execute loop of a prepared query with parameters, then it is much faster than if you, for example, make a whole query inside the loop. Next, look at the query execution plan, are there any effective indexes for record search - Mike
  • one
    as an option, loading the incoming file into a temporary table and comparing the tables with a single SQL query, it may be faster. Especially if the file load command load data MySQL - Mike
  • 2
    Here is every second question and substitute variables directly into the text. and then wonder why it slows down. NEVER substitute variable values ​​directly into the query text. Before the loop, once: $stmt=$db->prapare('SELECT id FROM parts WHERE part_number=? AND brand_id=?'); . In the loop $stmt->bind_param('ii',$item['part_number'],$item['brand_id']); $result = $stmt->get_result(); $result->fetch.... $stmt->bind_param('ii',$item['part_number'],$item['brand_id']); $result = $stmt->get_result(); $result->fetch.... $stmt->bind_param('ii',$item['part_number'],$item['brand_id']); $result = $stmt->get_result(); $result->fetch.... For good, if the necessary parameters are in ordinary variables and not in array elements, then bind_param is also done only once before the loop. inside only get_result. - Mike

1 answer 1

At the moment, I see as a way out:

  1. In the database on the field, which is checked hang unique key -> I understand that the brand should not be repeated. respectively on it and hang the key
  2. after that, perform On duplicate key update or insert ignore (depending on which target)

update

After updating the question: the unique key on the fields part_number, brand, and not just index. then you can safely insert ** on duplicate key update. No additional checks.

 ALTER TABLE `parts ` ADD UNIQUE `unique_index`(`part_number`, `brand`); 
  • A brand may have several number , but each brand can only have a unique number - ka5itoshka
  • @ ka5itoshka Updated answer - xAqweRx