I study MySQL. I have a question about the difference between queries made with a subquery and a query using the operation of joining tables. Here is the listing: 1 option with a subquery:

mysql> select fio -> from persons -> where person_id in (select person from film_creators -> where film in (select film_id from films -> where film_name = 'Сталкер')); 

Option 2 using the operation of joining tables:

 mysql> select fio -> from persons -> where person_id in (select distinct person -> from film_creators, films -> where film_creators.film = films.film_id and films.film_name = 'Сталкер'); 

Please explain which of the queries will be more efficient or any other differences. Thanks in advance for your help.

  • I have a comment on the issue design. In the future, I recommend that you write requests for so quoting only their text, without these -> : most then insert them into the IDE and these arrows only hinder the execution of the code. Distracted by them and clean? Save someone else's time potentially unlimited number of persons. As a model, not to go far - see the answer of Anton or Mark. - AK
  • one
    Optimizers in the DBMS are so mysterious that as soon as the optimization question arises, it is necessary to evaluate all the suitable options using explain. and no one guarantees that in the next version of MySQL will behave the same way as in the previous one. In general, the first query looks like a good option. But this is too simple a case, with very high probability in all cases the optimizer will give the same execution plan. The differences are visible when you expect to get quite a lot of lines, more precisely when the optimizer might think. that there will be a lot of them - Mike

2 answers 2

Your subqueries are used in both examples; meanwhile, you can write a query only with connections.

First, we first build a query that extracts the desired movie:

 SELECT ... FROM films WHERE films.film_name = 'Сталкер' 

We need the people who made this film. Since there is a many-to-many connection between people and movies, we need an intermediate table film_creators :

 SELECT ... FROM films JOIN film_creators ON film_creators.film = films.film_id JOIN persons ON person.person_id = film_creator.person WHERE films.film_name = 'Сталкер' 

In this case, we use two internal connections, that is, we will get a list of only those people for whom there is an entry in film_creators .

Perhaps the film_creators table contains a field that shows a person’s position , such as a director or screenwriter . Then the same person may appear on the list several times. Here we can use DISTINCT . We need only a full name, so the full request will look like:

 SELECT DISTINCT person.fio FROM films JOIN film_creators ON film_creators.film = films.film_id JOIN persons ON person.person_id = film_creator.person WHERE films.film_name = 'Сталкер' 

Or we can show the name several times with the post:

 SELECT person.fio, film_creators.position FROM films JOIN film_creators ON film_creators.film = films.film_id JOIN persons ON person.person_id = film_creator.person WHERE films.film_name = 'Сталкер' 

Now about the effectiveness. The most important quality of the code is its simplicity, not its effectiveness. Experience teaches us that you need to make the code efficient only after its inefficiency has been proven. Therefore, the work plan is always the same: first, you write working code, then you simplify it so that another person can work with it, or you in six months. Finally, you write a load test and see if you are comfortable with the speed. If the speed does not suit you, you rewrite the code so that it works quickly. At this point, you can sacrifice the clarity of the code by writing a detailed commentary on why this is done here.

Here, knowledge of the principles of relational DBMS operation saves, in particular, an understanding of how the search is performed in an ordered array using a composite key. In this case, you can simply arrange the indexes based on the analysis of queries. In this case, the text of the requests is not necessary to change

But if the indexes do not help, then the next step is to use the query scheduler and search through the syntax options. Something may help.

In general, syntax with joins is easier to read and understand than syntax with subqueries, so if logic allows, it is better to limit to joins. If does not allow, do a code with a subquery.

    Add before SELECT EXPLAIN and see what is more efficient. I suspect that such a request would be the most effective.

     select persons.fio from films LEFT JOIN film_creators ON ( film_creators.film = films.film_id ) JOIN persons ON ( persons.person_id = film_creators.person ) where films.film_name = 'Сталкер' AND persons.person_id IS NOT NULL; 
    • We select specific films with a specific name. Those. we eliminate 99% of the records immediately - Anton Shchyrov
    • yes, confused. but the persons LEFT is overkill. - Mike
    • @Mike Why? .. - Anton Shchyrov February
    • because the where clause clearly contradicts it. The optimizer oracle up to version 10 of this would have specifically demolished the roof. MySQL does not blow the roof away, it completely ignores the word LEFT in such cases. - Mike