I made a parser that every 30 minutes takes ads from the real estate forum and writes them to the MySQL database. But now there are many duplicate records in the database. Next, I made a table in which the ad uniq field is saved. A unique value is written there, for example, showthread.php? S = 4ad705f976ce73fb739b76820a3a573f & t = 1485914 (the last 7 digits of each ad are unique!). Tell me, please, how is it better for me, now, to organize an ad test for uniqueness? Thanks in advance for any help.

  • When recording new ads, I need to check if they are ALREADY in the database so as not to create a lot of duplicates. - spoilt
  • update? will overwrite the duplicate - Gorets
  • @Gorets do not quite understand why to use update? Can you explain, please? - spoilt

4 answers 4

For example, read UNIQUE KEY and INSERT IGNORE tutorials ...

    Posha towards ON DUPLICATE KEY.

      And what's stopping you from simply recording the ad id from the address bar?


      immediately - t = this is most likely the ID of the ad, which is unique. According to him and check.

      • And instead of one insert, do 2 queries: select + insert. Of course, some kind of optimization is needed here at least in order not to pull out too much, but this is already in another place :) - user6550
      • Not really. If you have a field with an index, then there the type request to choose to write with such a numeric id is done so quickly that you can ignore it. - FlashXXX
      • Depends on. We do not know all the scales of the tragedy :) Here is a simple example: you can first get a list of all t, exclude those that are in the database, and for the rest, download and add. By itself, to exclude - one, instead of on each t. - user6550
      • Did not quite understand you. "get a list of all t" - where to get? If it is necessary to check the presence in DB of a set of values, something is in the sql of IN options, well, of type in: select * from notes where id IN (23,43,54,55,123,344). In general, I advise you to better study the syntax sql + features of the subd with which you work, then much will immediately become clearer and easier. - FlashXXX
      • I have no idea where from. But I suppose that at first the forum pages with links to ads are parsed, that's where the lines like "showthread.php? S = 4ad705f976ce73fb739b76820a3a573f & amp; t = 148591" are taken. And with the tips - this is not for me :) - user6550

      It is best to organize this check by the DBMS. Declare a primary key (a field or set of fields that completely defines a record and cannot be NULL) or a uniqueness constraint (field values ​​that are not equal to NULL cannot be repeated, but equal ones can :)). But first you need to remove duplicates.

      You can add a constraint like this: ALTER TABLE myTable ADD CONSTRAINT constraintName UNIQUE (mycolumn);