博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
SQL Server 2017:自适应联接内部
阅读量:2512 次
发布时间:2019-05-11

本文共 17341 字,大约阅读时间需要 57 分钟。

SQL Server 2017 brings a new query processing methods that are designed to mitigate cardinality estimation errors in query plans and adapt plan execution based on the execution results. This innovation is called Adaptive Query Processing and consist of the three features:

SQL Server 2017带来了新的查询处理方法,旨在减轻查询计划中的基数估计错误并根据执行结果调整计划执行。 这项创新称为自适应查询处理,它包含以下三个功能:

  • Adaptive Memory Grant Feedback;

    自适应内存授权反馈;
  • Interleaved Execution;

    交错执行;
  • Adaptive Joins.

    自适应联接。

We have discussed two of them in the previous posts:  and 

我们在之前的文章中讨论了其中两个: 以及

In this post, we will discuss the last one – Adaptive Joins.

在本文中,我们将讨论最后一个- 自适应联接

Adaptive Joins were publicly introduced in the CTP 2.0, I advise you to read a post by Joe Sack  to get know about this feature, because I will give only a brief introduction before making a deep dive into the Adaptive Join internals.

自适应联接在CTP 2.0中公开介绍,我建议您阅读Joe Sack的文章, 以了解此功能,因为在深入研究Adaptive Join内部结构之前,我仅作简要介绍。 。

介绍 (Introduction)

We have three types of physical join algorithms in SQL Server: hash, nested loops and merge. Adaptive join allows SQL Server automatically choose an actual physical algorithm on the fly between the first two – hash (HM) and nested loops (NL).

在SQL Server中,我们有三种类型的物理联接算法:哈希,嵌套循环和合并。 自适应联接允许SQL Server在前两个-哈希(HM)和嵌套循环(NL)之间自动选择运行中的实际物理算法。

NL has two join  – naive nested loops join (inner loop scans the whole inner table or index) and index nested loops join (index on the join column of the inner table is used to find necessary rows and then those rows are applied to the outer row, also called Nested Loops Apply). Typically, the second one performs very well if you have rather small input on the outer side and indexed rather big input on the inner side.

NL有两种联接 –天真的嵌套循环联接(内部循环扫描整个内部表或索引)和索引嵌套循环联接(内部表的联接列上的索引用于查找必要的行,然后将这些行应用于外排,也称为嵌套循环套用)。 通常,如果外侧的输入量较小,而内侧的索引输入量较大,则第二个输入的性能会很好。

HM is more universal and uses hash algorithms to match rows, so no indexes are necessary. You may refer to my previous article () for more details.

HM更通用,并且使用哈希算法来匹配行,因此不需要索引。 您可以参考我以前的文章( )以了解更多详细信息。

Adaptive Join starts execution as a Hash Join. It consumes all the input of the build phase and looks at the adaptive join threshold, if the number of rows is more or equal this threshold it will continue as a hash join. However, if the number of rows is less than this threshold, it will switch to a NL.

自适应联接作为哈希联接开始执行。 它消耗了构建阶段的所有输入,并查看自适应联接阈值,如果行数大于或等于该阈值,它将继续作为哈希联接。 但是,如果行数小于此阈值,它将切换到NL。

Threshold is a number of rows, which represents a point, at which NL is cheaper than a HM. This number of rows is calculated during the compilation and is based on the cost estimates.

阈值是许多行,代表NL比HM便宜的一点。 该行数是在编译期间计算的,并且基于成本估算。

In the first implementation, Adaptive Join could be used only if the optimizer considers a Batch execution mode during the optimization (a new execution mode introduced in SQL Server 2012 for Columnstore indexes). However, Itzik Ben-Gan invented a trick with a dummy filtered Columnstore index on a table to give an optimizer the chance to consider a batch mode.

在第一个实现中,仅当优化程序在优化过程中考虑批处理执行模式(SQL Server 2012中针对列存储索引引入的新执行模式)时,才可以使用自适应联接。 但是,Itzik Ben-Gan发明了在表上使用虚拟过滤的Columnstore索引的技巧,使优化程序可以考虑使用批处理模式。

In a query plan, Adaptive Join is implemented as a new operator “Adaptive Join”, which has three inputs. The first input is an outer (build) input, the second one is an input if a HM is picked and the third one if NL is picked.

在查询计划中,自适应联接被实现为具有三个输入的新运算符“自适应联接”。 第一个输入是外部(内部)输入,第二个输入是拾取HM的输入,第三个是拾取NL的输入。

Let’s look at the example. I’ll use sample DB AdventureWorks2016CTP3 and set the compatibility level to 140 as it is necessary for an Adaptive Join to be available. I’ll also create a dummy filtered Columnstore index so the optimizer could consider a batch mode execution.

让我们来看一个例子。 我将使用样本DB AdventureWorks2016CTP3并将兼容性级别设置为140,因为必须具有自适应联接。 我还将创建一个虚拟的过滤后的Columnstore索引,以便优化程序可以考虑执行批处理模式。

use AdventureWorks2016CTP3;goalter database [AdventureWorks2016CTP3] set compatibility_level = 140;gocreate nonclustered columnstore index dummy on Sales.SalesOrderHeader(SalesOrderID) where SalesOrderID = -1 and SalesOrderID = -2;godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryID;go

The plan is:

该计划是:

  1. You may see a new plan operator – Adaptive Join.

    您可能会看到一个新的计划运算符-自适应联接。
  2. Is Adaptive property is set to TRUE.

    是否将Adaptive属性设置为TRUE。
  3. The estimated join type, based on the estimated number of rows is Hash Match. If we add to the query “option (optimize for (@TerritoryID = 0))”, there are no rows for TerritoryID = 0, the estimated number of rows will go to 1 row and we will see Nested Loops in this property.

    基于估计的行数的估计联接类型为哈希匹配。 如果我们将查询添加到“选项(优化(@TerritoryID = 0))”,则TerritoryID = 0的行将不存在,估计的行数将变为1行,并且在此属性中将看到嵌套循环。
  4. Actual Join Type is Adaptive Join. This property is not yet implemented in the current version, it should show the actual join type that was picked.

    实际联接类型为自适应联接。 当前版本尚未实现此属性,它应显示选择的实际联接类型。
  5. We see a Clustered Index Scan of the SalesOrderHeader as the first input – this is a build input if the HM is picked (7)

    我们将SalesOrderHeader的聚集索引扫描作为第一个输入-如果选择了HM,则这是一个构建输入(7)
  6. We see an Index Scan of the SalesOrderDetail as the second input – this is a probe input if the HM is picked (7).

    我们将SalesOrderDetail的Index Scan作为第二个输入-如果选择了HM,则这是一个探针输入(7)。
  7. We see a properties Hash Key Build and Probe that are properties for a HM.

    我们看到属性哈希键构建和探针是HM的属性。
  8. We also see a Bitmap filter defined, which is also used to be in a HM (or Merge) join usually.

    我们还看到了定义的位图过滤器,通常也用于HM(或合并)联接中。
  9. Then we may see an Adaptive Join Threshold – this is the number of rows which is used to switch between the join algorithms and input branches (6 for a HM and 10 for a NL)

    然后我们可能会看到一个自适应联接阈值–这是用于在联接算法和输入分支之间切换的行数(HM为6,NL为10)
  10. This is a branch that will be used by a NL join if the threshold 9 won’t be reached.

    如果不达到阈值9,则这是NL联接将使用的分支。
  11. Note the Outer Reference – the NL join’s property, which is SalesOrderID in our case.

    请注意外部参考– NL连接的属性,在我们的例子中为SalesOrderID。
  12.  about this for more details. 关于这个的更多细节。

Summarizing the plan – we see a new operator which has three inputs and depending on the threshold rows may use one or another input and the one or another join strategies either HM or NL. This operator contains properties for both of them.

总结计划–我们看到一个新的运算符,该运算符具有三个输入,并且根据阈值行,可能会使用一个或另一个输入以及一个或另一个连接策略HM或NL。 该运算符包含两个属性。

For a more detailed description, I’ll refer you to the Joe Sack’s blog post mentioned above and we will move on to some internals and undocumented stuff.

有关更详细的描述,请参考上面提到的Joe Sack的博客文章,我们将继续介绍一些内部文件和未记录的内容。

优化内部 (Optimization Internals)

It is interesting to know how this new operator is optimized. To look at the process we will use the undocumented trace flags 8607 which outputs the physical operator tree and the trace flag 7352 which outputs the tree after a post optimization rewrite phase. Let’s run our example query with those TFs and statistics IO turned on.

知道如何优化此新运算符很有趣。 为了查看该过程,我们将使用未记录的跟踪标志8607和跟踪标志7352,跟踪标志7607输出物理运算符树,跟踪标志7352在优化后的重写阶段之后输出树。 让我们在打开这些TF和统计信息IO的情况下运行示例查询。

alter database scoped configuration clear procedure_cache;goset statistics io on;declare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(querytraceon 3604, querytraceon 8607, querytraceon 7352);set statistics io off;

If we switch to the Message Tab in SSMS we will see the following.

如果在SSMS中切换到“消息”选项卡,将看到以下内容。

Output physical tree (shorten for brevity):

输出物理树(为简洁起见,简称):

We see that after the optimization process the output tree is a concatenation (familiar to us Concat operator) of the output from the first table and two joins HM and NL (apply NL).

我们看到在优化过程之后,输出树是第一个表的输出(两个连接HM和NL(应用NL))的连接(对于我们的Concat运算符来说是熟悉的)。

After the post optimization rewrite, we see an Adaptive Join operator.

重新优化后的优化后,我们将看到一个自适应联接运算符。

You may also notice a spool operator, that one is used not to scan the first (build) input twice when we switch to a NL join. That is why we see a Worktable in the statistics IO output.

您可能还会注意到一个假脱机操作员,当我们切换到NL联接时,它不会被用来两次不扫描第一个(构建)输入。 这就是为什么我们在统计数据IO输出中看到一个工作表的原因。

Interestingly, we may see this plan shape with a spool if we enable the undocumented trace flag 9415.

有趣的是,如果启用了未记录的跟踪标志9415,我们可能会看到带有线轴的计划形状。

alter database scoped configuration clear procedure_cache;dbcc traceon (9415);godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryID;godbcc traceoff (9415);go

The plan would be:

该计划将是:

Notice some concatenation properties that we have seen earlier in the Adaptive Join operator. There is also a property Adaptive Threshold Rows (unfortunately, not shown in the tooltip). Both plans produced identical results in my experiments.

请注意一些我们先前在“自适应连接”运算符中看到的串联属性。 还有一个属性“自适应阈值行”(不幸的是,工具提示中未显示)。 在我的实验中,两个计划都产生了相同的结果。

So far, we have found that an Adaptive Join undercover is a Concatenation operator with some extended properties. We’ll explore how it is physically executed later in the post.

到目前为止,我们发现Adaptive Join卧底是具有某些扩展属性的Concatenation运算符。 我们将在后面的文章中探讨如何物理执行它。

Adaptive Threshold Rows

自适应阈值行

This is the number of rows when the switch to a NL join occurs if it is not reached. This threshold is based on the estimates. If we influence the number of rows in the first input and optimize it for different values – we may see how the threshold is changed. Here is an example:

这是切换到NL联接(如果未达到)时的行数。 此阈值基于估计值。 如果我们影响第一个输入中的行数并针对不同的值对其进行优化–我们可能会看到阈值如何更改。 这是一个例子:

alter database scoped configuration clear procedure_cache;godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(optimize for (@TerritoryID = 0)); select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(optimize for (@TerritoryID = 1)); select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(optimize for (@TerritoryID = 10));go

We may see the following values as an adaptive threshold:

我们可能会将以下值视为自适应阈值:

This means that a threshold depends on the estimates, so accurate estimates are still important. For example, estimates with parameter sniffing with parameters of the first input will also leverage the threshold. This situation is described very well in the Itzik’s post, referred above.

这意味着阈值取决于估计,因此准确的估计仍然很重要。 例如,带有参数嗅探和第一个输入的参数的估计值也将利用阈值。 以上在Itzik的帖子中对此情况进行了很好的描述。

There is also one undocumented trace flag, that lowers the adaptive threshold to the minimum estimate (which is one row) plus one row, totaling 2 rows. That means that a NL join algorithm will be used only when there is one row of the first input.

还有一个未记录的跟踪标志,它将自适应阈值降低到最小估计值(一行)加一行,总计2行。 这意味着仅当第一输入有一行时才使用NL join算法。

Here is an example:

这是一个例子:

alter database scoped configuration clear procedure_cache;godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(querytraceon 9399);go

If we look at the threshold we will see:

如果我们看阈值,我们将看到:

Disabling Adaptive Join

禁用自适应联接

The first way is to switch to compatibility level 130. But if you switch to the Compatibility Level 130 you will also disable all the other optimizer improvements.

第一种方法是切换到兼容性级别130。但是,如果切换到兼容性级别130,还将禁用所有其他优化程序改进。

There are few other ways you can disable Adaptive Joins, all of them are undocumented at the moment of writing this post, however, some of them are very promising to be documented in the future.

禁用自适应联接的其他方法很少,在撰写本文时,所有这些都尚未记录,但是,将来很有希望记录其中的一些。

TF 9398

TF 9398

If we apply the undocumented trace flag 9398 to our sample query, we may see that there is no adaptive join in the query plan.

如果将未记录的跟踪标志9398应用于示例查询,则可能会看到查询计划中没有自适应联接。

alter database scoped configuration clear procedure_cache;godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(querytraceon 9398);go

The plan has only a HM join:

该计划只有HM连接:

Query hint DISABLE_BATCH_MODE_ADAPTIVE_JOINS

查询提示DISABLE_BATCH_MODE_ADAPTIVE_JOINS

We may also use a query hint to disable and adaptive join.

我们还可以使用查询提示来禁用和自适应连接。

alter database scoped configuration clear procedure_cache;godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryIDoption(use hint('DISABLE_BATCH_MODE_ADAPTIVE_JOINS'));go

The plan shape is the same as in the previous example:

平面形状与上一个示例相同:

Database Scoped Configuration Option DISABLE_BATCH_MODE_ADAPTIVE_JOINS

数据库范围的配置选项DISABLE_BATCH_MODE_ADAPTIVE_JOINS

The exact same keyword is also available as a DB scoped configuration option in case you would like to turn off Adaptive Joins for the whole DB.

如果您要关闭整个数据库的自适应联接,则也可以使用与数据库范围相同的配置选项使用完全相同的关键字。

alter database scoped configuration clear procedure_cache;alter database scoped configuration set DISABLE_BATCH_MODE_ADAPTIVE_JOINS = on;godeclare @TerritoryID int = 1;select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.TerritoryID = @TerritoryID;goalter database scoped configuration set DISABLE_BATCH_MODE_ADAPTIVE_JOINS = off;go

The plan is:

该计划是:

The last two methods (hint and config option) seem very promising to me to be documented at some point. Probably, it is not tested enough yet and not guaranteed to work properly, however, I hope they will be documented later.

对于我来说,最后两种方法(提示和配置选项)似乎很有希望在某些时候进行记录。 可能它尚未经过足够的测试,不能保证能正常工作,但是,我希望以后再进行记录。

Skipped Adaptive Join

跳过自适应连接

In this section, we will talk about the situation when Adaptive Join is considered, but skipped due to some reasons. There is a new extended event adaptive_join_skipped which has a field reason. If we query the DMV dm_xe_map_values like this, we’ll see all the reasons:

在本节中,我们将讨论考虑“自适应连接”但由于某些原因而被跳过的情况。 有一个新的扩展事件adaptive_join_skipped,它具有字段原因。 如果我们像这样查询DMV dm_xe_map_values,我们将看到所有原因:

select map_value from sys.dm_xe_map_values where name like 'adaptive_join_skipped_reason'

We will get the following results:

我们将得到以下结果:

  • eajsrExchangeTypeNotSupported

    eajsrExchangeTypeNotSupported
  • eajsrHJorNLJNotFound

    eajsrHJorNLJNotFound
  • eajsrInvalidAdaptiveThreshold

    eajsrInvalidAdaptiveThreshold
  • eajsrOuterCardMaxOne

    eajsrOuterCardMaxOne
  • eajsrUnMatchedOuter

    eajsrUnMatchedOuter

That is all about the optimization internals in this post, but this is a very first light touch of this topic. In the next part, we will talk about the execution internals of the Adaptive Join.

这就是本文中有关优化内部的全部内容,但这只是本主题的初学者。 在下一部分中,我们将讨论Adaptive Join的执行内部。

执行内部 (Execution Internals)

To look at how exactly the Adaptive Join is executed I used the WinDbg. I put a breakpoint on the method sqlmin!CQScanRangeNew::GetRow to see the call stack.  Here is what I’ve got:

为了查看自适应联接的执行方式,我使用了WinDbg。 我在方法sqlmin!CQScanRangeNew :: GetRow上设置了一个断点,以查看调用堆栈。 这是我所拥有的:

Reading bottom-top. At some point, you see that it is a batch mode Concatenation operator doing the Adaptive Join job and no separate adaptive join iterator.

阅读自上而下。 在某个时候,您会看到它是一个批处理模式的串联运算符,它执行Adaptive Join作业,而没有单独的自适应联接迭代器。

One more interesting thing is a new iterator AdaptiveBufReader. If we set a breakpoint in the debugger on the method CQScanAdaptiveBufReader:: GetRow – we will hit it.

另一个有趣的事情是新的迭代器AdaptiveBufReader。 如果我们在调试器上使用方法CQScanAdaptiveBufReader :: GetRow设置断点,则将其命中。

For example, for this query, we will hit it 7 times (6 rows + one more time to make sure there are no more rows):

例如,对于该查询,我们将其命中7次(6行+ 1次以确保没有更多行):

alter database scoped configuration clear procedure_cache;godeclare 	@SalesOrderID int = 43664select	sum(soh.SubTotal)from 	Sales.SalesOrderHeader soh	join Sales.SalesOrderDetail sod on soh.SalesOrderID = sod.SalesOrderIDwhere	soh.SalesOrderID <= @SalesOrderIDoption(optimize for(@SalesOrderID = 1000000))go

The plan is:

该计划是:

Adaptive Threshold is 1466 rows, however, the actual number of rows in the input is 6 rows and it is less than a threshold. Which is why Index Scan produced no rows, but the second branch with a Clustered Index Seek produced 60 rows.

自适应阈值是1466行,但是,输入中的实际行数是6行,并且小于阈值。 这就是为什么“索引扫描”不产生任何行,但是具有聚簇“索引搜索”的第二个分支却产生60行的原因。

As it was said earlier in the post, the input is not read twice from the source (SalesOrderHeader table), it is read from the internal structure, this structure was defined as a spool in the operator tree, but undercover it is called differently and has different properties – Adaptive Buffer Reader. But the physical operator is still Concatenation.

正如文章前面所述,输入不会从源(SalesOrderHeader表)中读取两次 ,而是从内部结构中读取,此结构在操作员树中被定义为假脱机,但在底层被不同地称为具有不同的属性–自适应缓冲区读取器。 但是物理运算符仍然是级联的。

That’s all for that post. Thank you for reading!

这就是那个帖子的全部。 感谢您的阅读!

目录 (Table of contents)

SQL Server 2017: Adaptive Join Internals
SQL Server 2017:自适应联接内部

翻译自:

转载地址:http://vpnwd.baihongyu.com/

你可能感兴趣的文章
flume+elasticsearch+kibana遇到的坑
查看>>
【MM系列】在SAP里查看数据的方法
查看>>
C#——winform
查看>>
CSS3 transform制作的漂亮的滚动式导航
查看>>
《小强升职记——时间管理故事书》读书笔记
查看>>
Alpha 冲刺(3/10)
查看>>
Kaldi中的Chain模型
查看>>
spring中的ResourceBundleMessageSource使用和测试示例
查看>>
css规范 - bem
查看>>
电梯调度程序的UI设计
查看>>
转自 zera php中extends和implements的区别
查看>>
Array.of使用实例
查看>>
【Luogu】P2498拯救小云公主(spfa)
查看>>
如何获取网站icon
查看>>
几种排序写法
查看>>
java 多线程的应用场景
查看>>
dell support
查看>>
转:Maven项目编译后classes文件中没有dao的xml文件以及没有resources中的配置文件的问题解决...
查看>>
MTK android 设置里 "关于手机" 信息参数修改
查看>>
单变量微积分笔记6——线性近似和二阶近似
查看>>