📜 ⬆️ ⬇️

Open webinar "Order of execution of a SELECT query and query plan in MS SQL Server"

Hello again!

Colleagues, on the last day of January we launch the course “MS SQL Server Developer” , in connection with which we had a thematic open lesson. On it, we talked about how MS SQL Server performs a SELECT query, discussed the order in which and what is being analyzed, and also slightly dived into reading the query plan.

The instructor is Kristina Kucherova , data model architect at Sberbank of Russia.


Goals and route of the webinar

At the beginning of the webinar were set the following goals:

  1. See how the server performs the request, and why it happens that way.
  2. Learn to read the query plan.

To achieve them, the teacher has prepared a simple but effective route:



Why do you need a query plan?

The query plan is a very useful tool that, unfortunately, many developers do not use. At first glance, it may seem that it is not necessary to know the mechanics of the query. However, if you understand what is happening inside SQL Server, you can write a more efficient query. And it will greatly help, for example, with optimization.

How do we see the SELECT query?

Let's see what a SELECT query looks like:

SELECT [field1], [field2] ...
Which fields do we choose?
FROM [table]
From where
WHERE [conditions]
Where conditions are such
GROUP BY [field1]
Group by field
HAVING [conditions]
Having such and such conditions
ORDER BY [field1]
Order (sort)

How to figure out where to go for the data?

The first thing the server tries to understand when a request comes in is where to go for the data. The FROM command answers this question, since it is here that we will have a list of tables (or the name of a single table).

For clarity, let's imagine that our server is a kind of butler, whom we order to collect us on vacation. Accordingly, the butler begins to think, and in what closet are the necessary things (in which table do you need to take data)? And so that our butler can easily complete his task, we use FROM.



How to understand what data to take?

Suppose the butler found the closet he needed and opened it. But what things to take? Maybe we are going to a ski resort? Or maybe on a hot sunny beach? In order for our things to match the weather, we need the WHERE command, which defines the conditions, that is, allows us to filter the data. If it's hot, we take slates, T-shirts and swimsuits, if it's cold - mittens, knitted socks, sweaters)).

The next stage is to put this data into groups, which is done with the help of GROUP BY (T-shirts separately, socks separately). According to the results of grouping, you can impose one more condition using HAVING (for example, weed out unpaired things). In the end, we add everything with the help of ORDER BY, resulting in a ready-made bag of things, or rather, an ordered data block.



By the way, there is a nuance, but it lies in the fact that there is a difference, which conditions should be prescribed in WHERE, and which in HAVING. But this is better to look at the video.

We continue. The path of the request is saved in the form of a query plan in the cache, that is, our butler writes everything down, because he is a good butler - what if you want to repeat your order next year? And such plans, in principle, can be many.

Connection types in terms of query

There are three connections that you may encounter in the query plan:

  1. Nested Loop.
  2. Merge join.
  3. Hash join.

Before dwelling on each of them in more detail, let's summarize, why should we even read the request plan. In fact, it is very useful, as you will learn:


Nested loop

Suppose we need to connect data from different tables. Let's present these tables in the form of ... a small amount of Skittles candy and full M & M's packaging.



When connecting the type of Nested Loop, we take the candy Skittles, and then we blindly get the candy from the M & M's package. If we don’t come across a candy of the same color (this is our condition), we get the next one, that is, a normal search occurs. As a result, we can say that the Nested Loop connection is more suitable for small amounts of data. Obviously, if there is a lot of data, busting is not the best option.



Let's see how it looks in the SQL panel:

--drop table skittles --drop table mms --запрос для окна слева create table mms (id int identity(1,1), color varchar(25), taste varchar(15)) insert into mms (color, taste) values ('yellow', 'chocolate') insert into mms (color, taste) values ('red', 'nuts') create clustered index IX_mms_color ON mms(color); create table skittles (id int identity(1,1), color varchar(25), taste varchar(15)) create index IX_skittles_id ON skittles(id); create clustered index IX_skittles_color ON skittles(color); insert into skittles (color, taste) values ('red', 'cherry') insert into skittles (color, taste) values ('blue', 'strange') insert into skittles (color, taste) values ('yellow', 'lemon') insert into skittles (color, taste) values ('green', 'apple') insert into skittles (color, taste) values ('orange', 'orange') --запрос для правого окна select mms.* from mms join skittles on mms.color = skittles.color select * from mms join skittles on mms.color = skittles.color 



Merge join

Connection is used for large amounts of data. When you have a merge join, both tables have an index by which they can be joined. In the case of candies, it is as if they are arranged in advance in colors.

It looks like this:



 --2 tables 50000 rows, only clustered index by color, color is not unique select COUNT(*) from mms_big join skittles_big on mms_big.color = skittles_big.color 



Merge join is good in the following cases:


Hash join

Hash join is used for unsorted large amounts of data. To join the tables in this case, you need to build something that simulates an index.

Hash join example:

 --drop table skittles --drop table mms --запрос для окна слева create table mms (id int identity(1,1), color varchar(25), taste varchar(15)) insert into mms (color, taste) values ('yellow', 'chocolate') insert into mms (color, taste) values ('red', 'nuts') insert into mms (color, taste) values ('blue', 'strange') insert into mms (color, taste) values ('green', 'chocolate') insert into mms (color, taste) values ('orange', 'chocolate') create table skittles (id int identity(1,1), color varchar(25), taste varchar(15)) insert into skittles (color, taste) values ('red', 'cherry') insert into skittles (color, taste) values ('blue', 'strange') insert into skittles (color, taste) values ('yellow', 'lemon') insert into skittles (color, taste) values ('green', 'apple') insert into skittles (color, taste) values ('orange', 'orange') --запрос для правого окна select * from mms join skittles on mms.color = skittles.color 



For clarity, recall our candy:



Using Hash join involves 2 phases of actions:

  1. Build - builds a hash table using the smallest table. For each value in the table number 1 is a hash. The value is stored in a hash table, and the calculated hash is used as a key.
  2. Probe. For each row from table No. 2, the hash value is calculated by the fields that are specified in the join (operator =). Look for a hash in the hash table, check the field values.







When Hash join is good:


An important point: if there is not enough memory, the recording will go to tempdb - to disk.

Friends, besides the above, in the open lesson included other interesting points that are best seen, watching the video. We suggest visiting the Open Day of the course “MS SQL Server Developer” where you can ask the teacher all your questions.

PS Teacher Kristina Kucherova expresses appreciation to Jes Schultz Borland for her presentation with PASS Summitt Execution Plans, which was used during the preparation of the open lesson.

Source: https://habr.com/ru/post/437828/