There is a code that theoretically does not allow duplication in the database over the $slon_code и $service_id fields (I did not make the composite unique key, I think for the time being to figure it out without it, and to understand the duplication as a whole).

The essence of the class is parsing and caching information.

The main method of the class get_response() : there is an elephant isset_slon() - we check the time, issue it from the database as a cache or parsim, no elephant - add add_slon() to the database, then the parsim and write the information to the elephant, issue it to the user.

Problem : for some reason, there are duplicates in the $slon_code и $service_id fields in the database. I see no reason to add the isset_slon() check inside add_slon() , because the addition now passes the check inside get_response() and falls into the else branch, that is, isset_slon() == false .

So here's the thing: when isset_post () works incorrectly, the work of further code is meaningless (and this is not about this method, but for anyone working with sql queries). What to do, what to do?

Questions:

  1. Why is the duplication of records in the database? (25 duplicates appeared on 5,000 elephants)
  2. How to solve a problem?
  3. How do CMS work where hundreds of sql queries are used to generate a web page?

Updated 12/08/2016

Started a small log, the second line is microtime (true). It can be seen that the record was found, and after that it is still being added.

 Date: 08.12.2016 13:51:28 1481194288.9353 slon_code: 125125125 service_id: 1 slon_code в базе есть, проверяем таймаут Date: 08.12.2016 13:51:29 1481194289.5376 slon_code: 125125125 service_id: 1 slon_codeа в базе нет, придется парсить с нуля slon_code добавлен в базу Date: 08.12.2016 13:51:29 1481194289.5627 slon_code: 125125125 service_id: 1 slon_codeа в базе нет, придется парсить с нуля slon_code добавлен в базу Date: 08.12.2016 13:51:29 1481194289.5887 slon_code: 125125125 service_id: 1 slon_codeа в базе нет, придется парсить с нуля slon_code добавлен в базу 

Request class

 class Resp { private $slon_code; private $service_id; public function __construct($slon_code = '', $service_id = 0) { $this->slon_code = $slon_code; $this->service_id = $service_id; } public function get_response() { if ($this->isset_slon()) { //СЛОН ЕСТЬ //Прошло ли достаточно времени с последнего парсинга if ($this->can_parse()) { //парсим $this->parse_slon(); return $this->get_slon_info(); } else { //выдаем из бд (кеш) return $this->get_slon_info(); } } else { //СЛОНА НЕТ! //добавим в базу слона/пустышку //по пустышке ограничиваю запросы пользователей $this->add_slon(); $this->parse_slon(); return $this->get_slon_info(); } } private function isset_slon() { $sql = 'SELECT COUNT(*) AS count FROM slon WHERE slon_code = :slon_code AND service_id = :service_id'; $data = $this->db->select($sql, $this->slon_params()); if ($data[0]['count'] > 0) { return true; } else { return false; } } private function add_slon() { //(появились одинаковые слоны в базе) $sql = 'INSERT INTO slon (slon_code, service_id) VALUES (:slon_code, :service_id)'; $params = [':slon_code' => $this->slon_code, ':service_id' => $this->service_id]; $inserted = $this->db->insert($sql, $params); if ($inserted) { return true; } else { return false; } } } 

Class code for query in DB:

 class Database { public function insert(){ //практически копия метода select() ниже } public function select($sql = '', $params = array()){ $sth = $this->dbh->prepare($sql); if (!empty($params)) { $s = $sth->execute($params); } else { $s = $sth->execute(); } if (!$s) { //логируем ошибку, ни одной ошибки в логах пока нет Log::create_mysql_log($sql, $params, $sth->errorInfo()); return false; } return $sth->fetchAll(PDO::FETCH_ASSOC); } } 
  • four
    It is difficult to say what the problem is but apparently between the isset and add calls, something is inserted by the elephant. And why make such a big code with unnecessary appeals to the database if the unique keys are designed for that and would completely eliminate such situations. Without a key, no one can ever guarantee that there will be no duplicates. In the extreme case, you can of course use insert ... select ... where not exists(select ...) but again insert-check must be an atomic operation for the database - Mike
  • @Mike in all add_slon () code is found only once in the get_response () method. If I make a composite unique key, I’ll have to handle database errors, I haven’t practiced this code yet, I decided to do it as easy as possible. - Jean-Claude
  • Error code is not difficult to verify. with a certain desire, you can do it even without analyzing errors (although you still need to always check for them, you never know what happens in the database). But here the feeling that in the second stream / application insertion passes. check the web server logs how many times the entry page was called - Mike
  • one
    Well, everything is simple. insert from your no different. Immediately after it we check db_connect-> errno (mysqli) or php.net/manual/ru/pdo.errorcode.php , if it is not 0, then there were errors. If it is 1062 (for MySQL), then this is a duplication error. - Mike
  • one
    @ Jean-Claude I mean, you have one php script. but it is executed simultaneously in multiple threads. those. for each request, a copy is launched that works in parallel with the others. and while one copy goes from verification to insertion, the second one has time to insert the record. at the same time, the code works "step by step and linearly" within one specific request. - PashaPash

3 answers 3

It is impossible to say why exactly the duplication of records occurs based on the provided code. The only obvious thing is that the function of adding a record is called twice for some reason. The most obvious reason is a repeated request from the browser, the existence of which can be viewed by server access_log ( access_log ).

Suppose that the browser actually makes two requests for an unknown reason that you cannot eliminate (it happens sometimes). Suppose you discover this second query and generate an error. This will not so much solve the problem as create additional inconveniences for your users: with a high probability they will see not the result of the first (successful) request, but the error message. Then they will write to you and you will spend time and effort explaining the obvious ...

In principle, this problem can be solved by returning a message about the successful addition, even if the record simply exists. After all, users are not so important to know whether the record was added in fact, how important it is to know that the record that they need, in the database, after all, exists.

To do this, you need to use a unique key for the columns slon_code and service_id .

 ALTER TABLE `slon` ADD UNIQUE( `slon_code`, `service_id`); 

In the comments you write that you have not yet practiced the handling of database errors. In your case, this can be done without using the INSERT IGNORE syntax.

If the get_slon_id() function makes a separate request and the code expects that the add_slon function will return the fact of the existence of a new record, then the entire modified addition procedure will be as follows:

 private function add_slon() { // добавили IGNORE $sql = 'INSERT IGNORE INTO slon (slon_code, service_id) VALUES (:slon_code, :service_id)'; $params = [':slon_code' => $this->slon_code, ':service_id' => $this->service_id]; $inserted = $this->db->insert($sql, $params); $this->slon_id = $this->get_slon_id(); // проверяем что ID найден, значит какая-то вставка точно удалась return (bool) $this->slon_id; } 

If you still need to check each time that the record is exactly what was added, you will still have to start practicing error handling.

The table locking approach recommended in another answer ( LOCK TABLES ... ) does not solve the problem of simultaneously adding records.

  • Yes, I will add a unique composite key. So far, no one has explained why duplication can occur except for a hint in a comment ru.stackoverflow.com/questions/599453/… - Jean-Claude
  • Duplication is easy. With each save, write to the log full trace. Then it will be easy to understand where the save call came from. Normal debugging. Another thing is that there is not necessarily an easy way to avoid duplication ... - sanmai
  • chose the answer. In fact, I consider all the answers as a complete answer + comments on this question, I collected the answer bit by bit, but yours are more or less compactly laid out on the shelves .. - Jean-Claude

Wrong approach. Data integrity should be monitored at the DBMS level. It is desirable to control it through:

  1. UNIQUE indices in the database
  2. LOCK the table from checking to existence until the completion of the insert.
  3. Or here’s a hack https://stackoverflow.com/questions/3164505/mysql-insert-record-if-not-exists-in-table

And it is better to combine 1 and 2

  • 2
    Why lock tables? - Qwertiy
  • so why lok tables if I understand how requests are executed one after another step by step? - Jean-Claude
  • one
    probably you do not understand well. If they were performed step by step and linearly there were no duplicates with such a check. - hardworm
  • For parser and cache, single-threading is rare. - hardworm
  • one
    the fact that it is all in 1 script is not indicated that this script is not executed at one time N times. As much as he was pulled at the moment of time, so many were fulfilled. - hardworm

As correctly noted in the remaining responses, controlling the uniqueness of the records is the concern of the database. But still, it’s better not to use a unique index, but a primary key.

 CREATE TABLE slon( slon_code VARCHAR(255) NOT NULL, service_id INT NOT NULL, PRIMARY KEY(slon_code, service_id) ); 

And then when you try to insert a duplicate, an error will be returned, which is caught by the corresponding function of the extension you use.

  • primary key is already slon_id - Jean-Claude
  • @ Jean-Claude If you have such a primary key in the table, then in principle you could not add two records with the same keys. Check back if it is there. If so, then this behavior is either due to some kind of incorrect caching, or unscientific fiction. - rjhdby
  • primary slon_id auto-increment. - Jean-Claude
  • @ Jean-Claude then you are somewhat cheated at the expense of the duplicate slon_id. It could not arise. - rjhdby
  • I have not done anything wrong, but I can’t throw off all the secrets here, no one wants to deal with a major issue, so I added the most necessary to the question. Duplicate in this case, I mean - the duplication of $ slon_code and $ service_id fields, which should not be there, the slon_id field has a different purpose. Yes, it is different auto-incremental. - Jean-Claude