There are 3 tables: posts (post) tags (tag) link tags and posts (post_tag)

Standard link table

post_id | tag_id 1 | 4 1 | 6 2 | 5 3 | 6 4 | 5 4 | 7 6 | 8 7 | 8 

It is necessary to get all posts that do not have tags 4 and 5 (and an additional condition is the exclusion of already viewed posts, let's say looked posts with id 6 and 7)

I make a request:

 SELECT post_id 
 FROM post_tag
 WHERE post_id NOT IN ( 
 SELECT DISTINCT post_id FROM post_tag WHERE tag_id IN (4,5) 
 ) AND post_id NOT IN (6,7)
 mysql> explain SELECT DISTINCT `idVideo` FROM `video_tag` WHERE `idVideo` NOT IN (SELECT DISTINCT idVideo FROM video_tag WHERE idTag IN (263,469,478,482,537,904,967)) AND (`idVideo`<>239529) AND (`quality`=100); +----+-------------+-----------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------+------+---------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------+------+---------+--------------------------+ | 1 | PRIMARY | video_tag | index | PRIMARY,video_tag-idVideo-video-id,video_tag-idVideo-idTag,video_tag-idVideo-idTag-quality,video_tag-idTag-idVideo-quality,video_tag-idTag-idVideo,video_tag-idTag,video_tag-idVideo | video_tag-idVideo-video-id | 4 | NULL | 4629680 | Using where | | 2 | SUBQUERY | video_tag | range | PRIMARY,video_tag-idVideo-video-id,video_tag-idVideo-idTag,video_tag-idVideo-idTag-quality,video_tag-idTag-idVideo-quality,video_tag-idTag-idVideo,video_tag-idTag,video_tag-idVideo | PRIMARY | 4 | NULL | 372807 | Using where; Using index | +----+-------------+-----------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------+------+---------+--------------------------+ 2 rows in set (0.01 sec) 

The number of indices for the test.

The request works terribly slowly. How to speed up this thing?

  • First, decide that you have mysql or postgress and correct the tags of the question. Secondly, give the resulting execution plan in the question text (for mySQL, you can get explain ваш-запрос with explain ваш-запрос , for postgress it is the same) - Mike
  • one
    Most likely it will make sense to replace the condition NOT IN (SELECT ...) with NOT EXISTS (SELECT ...) , - Yaant
  • or on the left join with a NULL check in the part where - Mike
  • I use Mysql specifically, but it's not a problem to move to Postgre. In Postgre there is WITH, perhaps it will partially help. - Olala Xd
  • show the current execution plan, show on it at least something. - Mike

1 answer 1

Try this option:

 SELECT post_id FROM post_tag WHERE post_id NOT IN (6,7) GROUP BY post_id HAVING sum( tag_id IN (4,5) ) = 0 

Or this:

 SELECT post.id FROM post LEFT JOIN post_tag ON tag_id IN (4,5) and post.id=post_tag.post_id WHERE post_tag.post_id IS NULL and post.id NOT IN (6,7) 
  • The request was executed for 20 seconds. mysql> explain SELECT DISTINCT idVideo FROM video_tag WHERE ( idVideo <> 239529) AND ( quality = 100) GROUP BY idVideo HAVING sum (idTag IN (263,469,478,482,537,904,967)) = 0; select_type | table | type | key | key_len | ref | rows | Extra | SIMPLE | video_tag | index | video_tag-idVideo-video-id | 4 | NULL | 4629680 | Using where - Olala Xd
  • @OlalaXd And how much is your initial one? - Mike
  • 19-22 seconds. There is a suspicion that this is due to the large amount of information transmitted - Olala Xd
  • @OlalaXd Try the second one, it seems to me that it should be at least a little bit faster, for records with a specific post_id in the first table are only one and the overall list is faster to get - Mike
  • @OlalaXd And in my first version he really went over the idVideo index field ... what he was looking for was interesting, all the lines that have idVideo <> 239529 or something ... well, he is wrong, almost the entire table is in the sample. .. you can write a comparison condition in the form idVideo+0<>239529 , it can make it just a full table scan (or better than the primary index) go ... - Mike