版本 | 日期 | 备注 |
---|---|---|
1.0 | 2024.2.18 | 文章首发 |
本文的的源码剖析全副基于TiDB6.5来做剖析。
1.引子
如果让你做一个分布式数据库的优化器,面对以下的SQL,你会想到什么好的办法去执行他们呢?
SELECT id, name FROM person WHERE age >= 18 or height > 180 limit 100;
:从条件上看,咱们看到条件其实是二选一的:age >= 18 or height > 180
。基于这种状况,咱们必定会去抉择有索引的数据,如果都有索引or都没有,那么必定抉择扫描行数起码的数据。如果有一些算子在外面的话,则额定须要思考数据的就近准则——有些算子在局部架构下能够充分利用MPP的能力,而有些则不行。SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders LEFT JOIN customers ON orders.customer_id=customers.customer_id;
分布式数据库中的join,最优的形式就是小表播送到大表数据所在的中央。那么首先咱们得晓得谁是小表,谁是大表。
2.统计信息收集
依据后面的两个例子,咱们能够发现——如果咱们须要找到SQL对应的最佳打算,咱们会须要一些表的元数据,或者说是统计信息。从惯例的角度来说,以下统计信息是必须收集的:
- 表的总行数
- 每列数据的均匀大小
- 每列数据的基数:即NDV(distinct value)
- 列的NULL值个数
如果是事务型的(行式存储),那么还要思考索引均匀长度、值的散布等等。
如果是剖析型的(列式存储),那么还须要思考相干列的最大值、最小值等等。
而统计形式的收集,会有多种形式。次要是思考资源和准确性之间的Trade off。常见的有:
- TopN:相干数据呈现次数前 n 的值。
- 直方图:用于形容数据分布图的工具。依照数据的值大小进行分桶,并用一些简略的数据来形容每个桶,比方落在桶里的值的个数。
- 2D直方图:因为直方图无奈反馈列之间的关联,能够用2D直方图(联结散布)做到,但空间占用也比拟多。
- Count-Min Sketch:相似哈希表加上计算器的实现。能够用很少的数据来形容整体数据的个性。
- HyperLogLog:一种估算海量数据基数的办法。很多状况下,统计并不需要那么准确。工程方面要在应用资源和准确性里找均衡。因而有人提出用HLL,这是一种占用资源少,但能给出较为精确的近似后果的算法。
TiDB收集的统计信息见:https://docs.pingcap.com/zh/tidb/v6.5/statistics#%E7%9B%B4%E6...
3.代价的评估
一个SQL真正的物理执行打算可能是有多个的。在以统计信息为根底之上,咱们还须要退出相应的权重,举个例子:
- 如果可能在内存中计算实现,就不必去重复发动本地IO
- 如果能在本地IO中实现,就不要去发动网络申请
等等...
这在TiDB的代码中也有对应的默认值。
DefOptCPUFactor = 3.0DefOptCopCPUFactor = 3.0DefOptTiFlashConcurrencyFactor = 24.0DefOptNetworkFactor = 1.0DefOptScanFactor = 1.5DefOptDescScanFactor = 3.0DefOptSeekFactor = 20.0DefOptMemoryFactor = 0.001DefOptDiskFactor = 1.5DefOptConcurrencyFactor = 3.0
var defaultVer2Factors = costVer2Factors{ TiDBTemp: costVer2Factor{"tidb_temp_table_factor", 0.00}, TiKVScan: costVer2Factor{"tikv_scan_factor", 40.70}, TiKVDescScan: costVer2Factor{"tikv_desc_scan_factor", 61.05}, TiFlashScan: costVer2Factor{"tiflash_scan_factor", 11.60}, TiDBCPU: costVer2Factor{"tidb_cpu_factor", 49.90}, TiKVCPU: costVer2Factor{"tikv_cpu_factor", 49.90}, TiFlashCPU: costVer2Factor{"tiflash_cpu_factor", 2.40}, TiDB2KVNet: costVer2Factor{"tidb_kv_net_factor", 3.96}, TiDB2FlashNet: costVer2Factor{"tidb_flash_net_factor", 2.20}, TiFlashMPPNet: costVer2Factor{"tiflash_mpp_net_factor", 1.00}, TiDBMem: costVer2Factor{"tidb_mem_factor", 0.20}, TiKVMem: costVer2Factor{"tikv_mem_factor", 0.20}, TiFlashMem: costVer2Factor{"tiflash_mem_factor", 0.05}, TiDBDisk: costVer2Factor{"tidb_disk_factor", 200.00}, TiDBRequest: costVer2Factor{"tidb_request_factor", 6000000.00},}
4.执行打算枚举与择优
当咱们能够评估出物理执行打算的代价时,将会枚举所有能够用上物理执行打算,并在其中抉择代价最小的物理执行打算。个别枚举分为两个流派:
- 自底向上:代表有System R。
- 自顶向下:代表有Cascade。
自底向上没法解决一个问题。当下层对上层的输入后果程序感兴趣时,那就不能只能从底层的视角来寻找部分最优。
举个例子,多表Join。一开始两个表Join可能HashJoin代价很低,然而Join下一个表时,用MergeJoin从整体来看代价更低。从这个case来看,自底向上来做打算取优并不适合。
所以有了Cascade:
- 搜寻计划是自顶向下的。这意味着它能够防止部分最优而导致全局不优。从Operator Tree 自顶向下遍历时,能够做一系列变换:
- Implementation:逻辑算子能够转换成物理算子;例如Join转换成NestLoop或者HashJoin等
- Exploration:逻辑算子能够做等价变换;例如替换Inner Join的两个子节点,即可枚举Join程序
图片来自于:Cascades Optimizer——https://zhuanlan.zhihu.com/p/73545345
- 它是基于Volcano模型演进而来的。
- 用面向对象的形式进行建模,编写规定时不须要关怀搜寻过程。相比传统优化器中面向过程去一条条的编写,确实是很大的改良。
5.TiDB的实现
大抵的代码调用链为:
-- session/session.go\-- ExecuteStmt //SQL执行的入口|-- executor/compiler.go\-- Compile //将SQL变成可执行的打算|--planner/planner/optmize.go\-- Optimize //优化的入口\-- optimize //这里有两个入口。一种是新的优化器入口,一种是老的优化器入口。依据配置来抉择。最终都会返回最优的物理执行打算。 |-- planner/cascades/optmize.go \--FindBestPlan 见5.1 \-- onPhasePreprocessing //见5.3 \-- implGroup |--planner/core/optmizer.go //见5.4 \-- DoOptimize \-- physicalOptimize |--planner/core/find_best_task.go \-- findBestTask \-- enumeratePhysicalPlans4Task \-- compareTaskCost \-- getTaskPlanCost |-- planner/core/plan_cost_ver2.go \-- getPlanCost
5.1 逻辑优化
外围入口为:
// FindBestPlan is the optimization entrance of the cascades planner. The// optimization is composed of 3 phases: preprocessing, exploration and implementation.//// ------------------------------------------------------------------------------// Phase 1: Preprocessing// ------------------------------------------------------------------------------//// The target of this phase is to preprocess the plan tree by some heuristic// rules which should always be beneficial, for example Column Pruning.//// ------------------------------------------------------------------------------// Phase 2: Exploration// ------------------------------------------------------------------------------//// The target of this phase is to explore all the logically equivalent// expressions by exploring all the equivalent group expressions of each group.//// At the very beginning, there is only one group expression in a Group. After// applying some transformation rules on certain expressions of the Group, all// the equivalent expressions are found and stored in the Group. This procedure// can be regarded as searching for a weak connected component in a directed// graph, where nodes are expressions and directed edges are the transformation// rules.//// ------------------------------------------------------------------------------// Phase 3: Implementation// ------------------------------------------------------------------------------//// The target of this phase is to search the best physical plan for a Group// which satisfies a certain required physical property.//// In this phase, we need to enumerate all the applicable implementation rules// for each expression in each group under the required physical property. A// memo structure is used for a group to reduce the repeated search on the same// required physical property.func (opt *Optimizer) FindBestPlan(sctx sessionctx.Context, logical plannercore.LogicalPlan) (p plannercore.PhysicalPlan, cost float64, err error) { logical, err = opt.onPhasePreprocessing(sctx, logical) if err != nil { return nil, 0, err } rootGroup := memo.Convert2Group(logical) err = opt.onPhaseExploration(sctx, rootGroup) if err != nil { return nil, 0, err } p, cost, err = opt.onPhaseImplementation(sctx, rootGroup) if err != nil { return nil, 0, err } err = p.ResolveIndices() return p, cost, err}
正文+代码很洁净,这里一共做三件事
- onPhasePreprocessing:正文很切实,说
for example Column Pruning
,后果外面真的只做了列裁剪。 - onPhaseExploration:摸索所有逻辑等价存在的可能
- onPhaseImplementation:依据代价寻找最优的物理执行打算
这块官网的博客写的十分好,我就不反复了:https://cn.pingcap.com/blog/tidb-cascades-planner/
5.2 统计信息收集
这块代码次要集中在stats.go里,外面的外围数据结构是stats_info.go里的StatsInfo。调用链大抵为:
|-- planner/cascades/optimizer.go\--fillGroupStats|-- planner/core/stats.go\--DeriveStats
type LogicalPlan interface { Plan //......疏忽一些代码 // DeriveStats derives statistic info for current plan node given child stats. // We need selfSchema, childSchema here because it makes this method can be used in // cascades planner, where LogicalPlan might not record its children or schema. DeriveStats(childStats []*property.StatsInfo, selfSchema *expression.Schema, childSchema []*expression.Schema, colGroups [][]*expression.Column) (*property.StatsInfo, error) //......疏忽一些代码}
有很多构造体实现了这个办法:
- 收集统计信息次要是analyze.go#Next办法中调用的#analyzeWorker。
5.3 新版本 物理执行打算的抉择
代码入口为:
// implGroup finds the best Implementation which satisfies the required// physical property for a Group. The best Implementation should have the// lowest cost among all the applicable Implementations.//// g: the Group to be implemented.// reqPhysProp: the required physical property.// costLimit: the maximum cost of all the Implementations.func (opt *Optimizer) implGroup(g *memo.Group, reqPhysProp *property.PhysicalProperty, costLimit float64) (memo.Implementation, error) { groupImpl := g.GetImpl(reqPhysProp) if groupImpl != nil { if groupImpl.GetCost() <= costLimit { return groupImpl, nil } return nil, nil } // Handle implementation rules for each equivalent GroupExpr. var childImpls []memo.Implementation err := opt.fillGroupStats(g) if err != nil { return nil, err } outCount := math.Min(g.Prop.Stats.RowCount, reqPhysProp.ExpectedCnt) for elem := g.Equivalents.Front(); elem != nil; elem = elem.Next() { curExpr := elem.Value.(*memo.GroupExpr) impls, err := opt.implGroupExpr(curExpr, reqPhysProp) if err != nil { return nil, err } for _, impl := range impls { childImpls = childImpls[:0] for i, childGroup := range curExpr.Children { childImpl, err := opt.implGroup(childGroup, impl.GetPlan().GetChildReqProps(i), impl.GetCostLimit(costLimit, childImpls...)) if err != nil { return nil, err } if childImpl == nil { impl.SetCost(math.MaxFloat64) break } childImpls = append(childImpls, childImpl) } if impl.GetCost() == math.MaxFloat64 { continue } implCost := impl.CalcCost(outCount, childImpls...) if implCost > costLimit { continue } if groupImpl == nil || groupImpl.GetCost() > implCost { groupImpl = impl.AttachChildren(childImpls...) costLimit = implCost } } } // Handle enforcer rules for required physical property. for _, rule := range GetEnforcerRules(g, reqPhysProp) { newReqPhysProp := rule.NewProperty(reqPhysProp) enforceCost := rule.GetEnforceCost(g) childImpl, err := opt.implGroup(g, newReqPhysProp, costLimit-enforceCost) if err != nil { return nil, err } if childImpl == nil { continue } impl := rule.OnEnforce(reqPhysProp, childImpl) implCost := enforceCost + childImpl.GetCost() impl.SetCost(implCost) if groupImpl == nil || groupImpl.GetCost() > implCost { groupImpl = impl costLimit = implCost } } if groupImpl == nil || groupImpl.GetCost() == math.MaxFloat64 { return nil, nil } g.InsertImpl(reqPhysProp, groupImpl) return groupImpl, nil}
这里个函数会找出潜在符合条件的物理执行打算,并一直的搜寻。但它有一个剪枝逻辑——会记录以后最小的cost,如果一个执行打算自上向下搜寻时,如果超过了这个cost,则间接返回。这就是在第3节提到的自顶向下的优化。
接下来咱们看一下memo.Implementation
的定义:
package memoimport ( plannercore "github.com/pingcap/tidb/planner/core")// Implementation defines the interface for cost of physical plan.type Implementation interface { CalcCost(outCount float64, children ...Implementation) float64 SetCost(cost float64) GetCost() float64 GetPlan() plannercore.PhysicalPlan // AttachChildren is used to attach children implementations and returns it self. AttachChildren(children ...Implementation) Implementation // GetCostLimit gets the costLimit for implementing the next childGroup. GetCostLimit(costLimit float64, children ...Implementation) float64}
其中CalcCost
办法就是用来计算物理执行打算的代价。一共有这么多构造体实现了它:
5.3.1 代价的评估
咱们以结尾的例子,解说代价的评估。
select代价计算形式
// CalcCost calculates the cost of the table scan Implementation.func (impl *TableScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 { ts := impl.plan.(*plannercore.PhysicalTableScan) width := impl.tblColHists.GetTableAvgRowSize(impl.plan.SCtx(), impl.tblCols, kv.TiKV, true) sessVars := ts.SCtx().GetSessionVars() impl.cost = outCount * sessVars.GetScanFactor(ts.Table) * width if ts.Desc { impl.cost = outCount * sessVars.GetDescScanFactor(ts.Table) * width } return impl.cost}// GetScanFactor returns the session variable scanFactor// returns 0 when tbl is a temporary table.func (s *SessionVars) GetScanFactor(tbl *model.TableInfo) float64 { if tbl != nil { if tbl.TempTableType != model.TempTableNone { return 0 } } return s.scanFactor }// CalcCost implements Implementation interface.func (impl *IndexScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 { is := impl.plan.(*plannercore.PhysicalIndexScan) sessVars := is.SCtx().GetSessionVars() rowSize := impl.tblColHists.GetIndexAvgRowSize(is.SCtx(), is.Schema().Columns, is.Index.Unique) cost := outCount * rowSize * sessVars.GetScanFactor(is.Table) if is.Desc { cost = outCount * rowSize * sessVars.GetDescScanFactor(is.Table) } cost += float64(len(is.Ranges)) * sessVars.GetSeekFactor(is.Table) impl.cost = cost return impl.cost}
这里咱们以全表扫描表和命中索引的代码为例子。其中默认的scanFactor是1.5。这里能够看到indexScan和tableScan少了一个因数:width。这个变量代表了所需列的均匀大小。这么看基本上是indexScan最优了。
这里的代码笔者认为有点不优雅——当Desc时,其实之前的Cost是没必要算一遍的,节约CPU资源。
join代价计算形式
// CalcCost implements Implementation CalcCost interface.func (impl *HashJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 { hashJoin := impl.plan.(*plannercore.PhysicalHashJoin) // The children here are only used to calculate the cost. hashJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan()) selfCost := hashJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), false, 0, nil) impl.cost = selfCost + children[0].GetCost() + children[1].GetCost() return impl.cost}// CalcCost implements Implementation CalcCost interface.func (impl *MergeJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 { mergeJoin := impl.plan.(*plannercore.PhysicalMergeJoin) // The children here are only used to calculate the cost. mergeJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan()) selfCost := mergeJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), 0) impl.cost = selfCost + children[0].GetCost() + children[1].GetCost() return impl.cost}
具体的计算都在plan_cost_v1.go里:
// GetCost computes cost of hash join operator itself.func (p *PhysicalHashJoin) GetCost(lCnt, rCnt float64, isMPP bool, costFlag uint64, op *physicalOptimizeOp) float64 { buildCnt, probeCnt := lCnt, rCnt build := p.children[0] // Taking the right as the inner for right join or using the outer to build a hash table. if (p.InnerChildIdx == 1 && !p.UseOuterToBuild) || (p.InnerChildIdx == 0 && p.UseOuterToBuild) { buildCnt, probeCnt = rCnt, lCnt build = p.children[1] } sessVars := p.ctx.GetSessionVars() oomUseTmpStorage := variable.EnableTmpStorageOnOOM.Load() memQuota := sessVars.MemTracker.GetBytesLimit() // sessVars.MemQuotaQuery && hint rowSize := getAvgRowSize(build.statsInfo(), build.Schema().Columns) spill := oomUseTmpStorage && memQuota > 0 && rowSize*buildCnt > float64(memQuota) && p.storeTp != kv.TiFlash // Cost of building hash table. cpuFactor := sessVars.GetCPUFactor() diskFactor := sessVars.GetDiskFactor() memoryFactor := sessVars.GetMemoryFactor() concurrencyFactor := sessVars.GetConcurrencyFactor() cpuCost := buildCnt * cpuFactor memoryCost := buildCnt * memoryFactor diskCost := buildCnt * diskFactor * rowSize // Number of matched row pairs regarding the equal join conditions. helper := &fullJoinRowCountHelper{ sctx: p.SCtx(), cartesian: false, leftProfile: p.children[0].statsInfo(), rightProfile: p.children[1].statsInfo(), leftJoinKeys: p.LeftJoinKeys, rightJoinKeys: p.RightJoinKeys, leftSchema: p.children[0].Schema(), rightSchema: p.children[1].Schema(), leftNAJoinKeys: p.LeftNAJoinKeys, rightNAJoinKeys: p.RightNAJoinKeys, } numPairs := helper.estimate() // For semi-join class, if `OtherConditions` is empty, we already know // the join results after querying hash table, otherwise, we have to // evaluate those resulted row pairs after querying hash table; if we // find one pair satisfying the `OtherConditions`, we then know the // join result for this given outer row, otherwise we have to iterate // to the end of those pairs; since we have no idea about when we can // terminate the iteration, we assume that we need to iterate half of // those pairs in average. if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin || p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin { if len(p.OtherConditions) > 0 { numPairs *= 0.5 } else { numPairs = 0 } } if hasCostFlag(costFlag, CostFlagUseTrueCardinality) { numPairs = getOperatorActRows(p) } // Cost of querying hash table is cheap actually, so we just compute the cost of // evaluating `OtherConditions` and joining row pairs. probeCost := numPairs * cpuFactor probeDiskCost := numPairs * diskFactor * rowSize // Cost of evaluating outer filter. if len(p.LeftConditions)+len(p.RightConditions) > 0 { // Input outer count for the above compution should be adjusted by SelectionFactor. probeCost *= SelectionFactor probeDiskCost *= SelectionFactor probeCost += probeCnt * cpuFactor } diskCost += probeDiskCost probeCost /= float64(p.Concurrency) // Cost of additional concurrent goroutines. cpuCost += probeCost + float64(p.Concurrency+1)*concurrencyFactor // Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table if p.UseOuterToBuild { if spill { // It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk cpuCost += buildCnt * cpuFactor } else { cpuCost += buildCnt * cpuFactor / float64(p.Concurrency) } diskCost += buildCnt * diskFactor * rowSize } if spill { memoryCost *= float64(memQuota) / (rowSize * buildCnt) } else { diskCost = 0 } if op != nil { setPhysicalHashJoinCostDetail(p, op, spill, buildCnt, probeCnt, cpuFactor, rowSize, numPairs, cpuCost, probeCost, memoryCost, diskCost, probeDiskCost, diskFactor, memoryFactor, concurrencyFactor, memQuota) } return cpuCost + memoryCost + diskCost}// GetCost computes cost of merge join operator itself.func (p *PhysicalMergeJoin) GetCost(lCnt, rCnt float64, costFlag uint64) float64 { outerCnt := lCnt innerCnt := rCnt innerKeys := p.RightJoinKeys innerSchema := p.children[1].Schema() innerStats := p.children[1].statsInfo() if p.JoinType == RightOuterJoin { outerCnt = rCnt innerCnt = lCnt innerKeys = p.LeftJoinKeys innerSchema = p.children[0].Schema() innerStats = p.children[0].statsInfo() } helper := &fullJoinRowCountHelper{ sctx: p.SCtx(), cartesian: false, leftProfile: p.children[0].statsInfo(), rightProfile: p.children[1].statsInfo(), leftJoinKeys: p.LeftJoinKeys, rightJoinKeys: p.RightJoinKeys, leftSchema: p.children[0].Schema(), rightSchema: p.children[1].Schema(), } numPairs := helper.estimate() if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin || p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin { if len(p.OtherConditions) > 0 { numPairs *= 0.5 } else { numPairs = 0 } } if hasCostFlag(costFlag, CostFlagUseTrueCardinality) { numPairs = getOperatorActRows(p) } sessVars := p.ctx.GetSessionVars() probeCost := numPairs * sessVars.GetCPUFactor() // Cost of evaluating outer filters. var cpuCost float64 if len(p.LeftConditions)+len(p.RightConditions) > 0 { probeCost *= SelectionFactor cpuCost += outerCnt * sessVars.GetCPUFactor() } cpuCost += probeCost // For merge join, only one group of rows with same join key(not null) are cached, // we compute average memory cost using estimated group size. NDV, _ := getColsNDVWithMatchedLen(innerKeys, innerSchema, innerStats) memoryCost := (innerCnt / NDV) * sessVars.GetMemoryFactor() return cpuCost + memoryCost}
HashJoin要思考到内存不够的状况,因而在计算到数据不够时,会将对应的数据压入硬盘。但它对数据的程序并无要求,因而能够并发的去做。见:
// Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table if p.UseOuterToBuild { if spill { // It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk cpuCost += buildCnt * cpuFactor } else { cpuCost += buildCnt * cpuFactor / float64(p.Concurrency) } diskCost += buildCnt * diskFactor * rowSize }
而MergeJoin的代价显然会更小,但可能选则到这个打算也有较高的要求:当两个关联表要 Join 的字段须要按排好的程序读取时,实用 Merge Join 算法。
5.4 老版本 物理执行打算的抉择
5.4.1 代价的评估
这块代码次要是在plan_cost_ver1.go
和plan_cost_ver2.go
。v2对代价公式进行了更准确的回归校准,调整了局部代价公式,比此前版本的代价公式更加精确。代码上也更为简洁:v2只暴露出了一个公共办法,外部通过不同的类型做转发。
// GetPlanCost returns the cost of this plan.func GetPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) { return getPlanCost(p, taskType, option)}func getPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) { if p.SCtx().GetSessionVars().CostModelVersion == modelVer2 { planCost, err := p.getPlanCostVer2(taskType, option) return planCost.cost, err } return p.getPlanCostVer1(taskType, option)}
依据不同的PhysicalPlan
类型,会找到不同绑定办法:
v1的局部办法展现:
select代价计算形式
// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:// plan-cost = child-cost + filter-costfunc (p *PhysicalSelection) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) { if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) { return p.planCostVer2, nil } inputRows := getCardinality(p.children[0], option.CostFlag) cpuFactor := getTaskCPUFactorVer2(p, taskType) filterCost := filterCostVer2(option, inputRows, p.Conditions, cpuFactor) childCost, err := p.children[0].getPlanCostVer2(taskType, option) if err != nil { return zeroCostVer2, err } p.planCostVer2 = sumCostVer2(filterCost, childCost) p.planCostInit = true return p.planCostVer2, nil}
这部分代码简略易读。代价就是子查问的代价+筛选的代价。
那么问题来了,中索引的和不中索引的代价应该是不一样的。这里没有体现进去啊。认真看childCost, err := p.children[0].getPlanCostVer2(taskType, option)
,这里是会去获取子物理执行打算的代价。
// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:func (p *PointGetPlan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) { if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) { return p.planCostVer2, nil } if p.accessCols == nil { // from fast plan code path p.planCostVer2 = zeroCostVer2 p.planCostInit = true return zeroCostVer2, nil } rowSize := getAvgRowSize(p.stats, p.schema.Columns) netFactor := getTaskNetFactorVer2(p, taskType) p.planCostVer2 = netCostVer2(option, 1, rowSize, netFactor) p.planCostInit = true return p.planCostVer2, nil}func netCostVer2(option *PlanCostOption, rows, rowSize float64, netFactor costVer2Factor) costVer2 { return newCostVer2(option, netFactor, rows*rowSize*netFactor.Value, func() string { return fmt.Sprintf("net(%v*rowsize(%v)*%v)", rows, rowSize, netFactor) })}// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:// plan-cost = rows * log2(row-size) * scan-factor// log2(row-size) is from experiments.func (p *PhysicalTableScan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) { if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) { return p.planCostVer2, nil } rows := getCardinality(p, option.CostFlag) var rowSize float64 if p.StoreType == kv.TiKV { rowSize = getAvgRowSize(p.stats, p.tblCols) // consider all columns if TiKV } else { // TiFlash rowSize = getAvgRowSize(p.stats, p.schema.Columns) } rowSize = math.Max(rowSize, 2.0) scanFactor := getTaskScanFactorVer2(p, p.StoreType, taskType) p.planCostVer2 = scanCostVer2(option, rows, rowSize, scanFactor) // give TiFlash a start-up cost to let the optimizer prefers to use TiKV to process small table scans. if p.StoreType == kv.TiFlash { p.planCostVer2 = sumCostVer2(p.planCostVer2, scanCostVer2(option, 10000, rowSize, scanFactor)) } p.planCostInit = true return p.planCostVer2, nil}func scanCostVer2(option *PlanCostOption, rows, rowSize float64, scanFactor costVer2Factor) costVer2 { if rowSize < 1 { rowSize = 1 } return newCostVer2(option, scanFactor, // rows * log(row-size) * scanFactor, log2 from experiments rows*math.Log2(rowSize)*scanFactor.Value, func() string { return fmt.Sprintf("scan(%v*logrowsize(%v)*%v)", rows, rowSize, scanFactor) })}
scanFactor的代价默认是40.7,netFactor的代价默认是3.96。联合代码来看,命中索引的代价更优。
join代价计算形式
// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:// plan-cost = build-child-cost + build-filter-cost +// (probe-cost + probe-filter-cost) / concurrency// probe-cost = probe-child-cost * build-rows / batchRatiofunc (p *PhysicalIndexJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) { return p.getIndexJoinCostVer2(taskType, option, 0)}func (p *PhysicalIndexHashJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) { return p.getIndexJoinCostVer2(taskType, option, 1)}func (p *PhysicalIndexMergeJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) { return p.getIndexJoinCostVer2(taskType, option, 2)}func (p *PhysicalIndexJoin) getIndexJoinCostVer2(taskType property.TaskType, option *PlanCostOption, indexJoinType int) (costVer2, error) { if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) { return p.planCostVer2, nil } build, probe := p.children[1-p.InnerChildIdx], p.children[p.InnerChildIdx] buildRows := getCardinality(build, option.CostFlag) buildRowSize := getAvgRowSize(build.Stats(), build.Schema().Columns) probeRowsOne := getCardinality(probe, option.CostFlag) probeRowsTot := probeRowsOne * buildRows probeRowSize := getAvgRowSize(probe.Stats(), probe.Schema().Columns) buildFilters, probeFilters := p.LeftConditions, p.RightConditions probeConcurrency := float64(p.ctx.GetSessionVars().IndexLookupJoinConcurrency()) cpuFactor := getTaskCPUFactorVer2(p, taskType) memFactor := getTaskMemFactorVer2(p, taskType) requestFactor := getTaskRequestFactorVer2(p, taskType) buildFilterCost := filterCostVer2(option, buildRows, buildFilters, cpuFactor) buildChildCost, err := build.getPlanCostVer2(taskType, option) if err != nil { return zeroCostVer2, err } buildTaskCost := newCostVer2(option, cpuFactor, buildRows*10*cpuFactor.Value, func() string { return fmt.Sprintf("cpu(%v*10*%v)", buildRows, cpuFactor) }) startCost := newCostVer2(option, cpuFactor, 10*3*cpuFactor.Value, func() string { return fmt.Sprintf("cpu(10*3*%v)", cpuFactor) }) probeFilterCost := filterCostVer2(option, probeRowsTot, probeFilters, cpuFactor) probeChildCost, err := probe.getPlanCostVer2(taskType, option) if err != nil { return zeroCostVer2, err } var hashTableCost costVer2 switch indexJoinType { case 1: // IndexHashJoin hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor) case 2: // IndexMergeJoin hashTableCost = newZeroCostVer2(traceCost(option)) default: // IndexJoin hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor) } // IndexJoin executes a batch of rows at a time, so the actual cost of this part should be // `innerCostPerBatch * numberOfBatches` instead of `innerCostPerRow * numberOfOuterRow`. // Use an empirical value batchRatio to handle this now. // TODO: remove this empirical value. batchRatio := 6.0 probeCost := divCostVer2(mulCostVer2(probeChildCost, buildRows), batchRatio) // Double Read Cost doubleReadCost := newZeroCostVer2(traceCost(option)) if p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate > 0 { batchSize := float64(p.ctx.GetSessionVars().IndexJoinBatchSize) taskPerBatch := 1024.0 // TODO: remove this magic number doubleReadTasks := buildRows / batchSize * taskPerBatch doubleReadCost = doubleReadCostVer2(option, doubleReadTasks, requestFactor) doubleReadCost = mulCostVer2(doubleReadCost, p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate) } p.planCostVer2 = sumCostVer2(startCost, buildChildCost, buildFilterCost, buildTaskCost, divCostVer2(sumCostVer2(doubleReadCost, probeCost, probeFilterCost, hashTableCost), probeConcurrency)) p.planCostInit = true return p.planCostVer2, nil}
关键在于:
switch indexJoinType { case 1: // IndexHashJoin hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor) case 2: // IndexMergeJoin hashTableCost = newZeroCostVer2(traceCost(option)) default: // IndexJoin hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor) }
对应办法:
func hashBuildCostVer2(option *PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costVer2Factor) costVer2 { // TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table hashKeyCost := newCostVer2(option, cpuFactor, buildRows*nKeys*cpuFactor.Value, func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) }) hashMemCost := newCostVer2(option, memFactor, buildRows*buildRowSize*memFactor.Value, func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) }) hashBuildCost := newCostVer2(option, cpuFactor, buildRows*cpuFactor.Value, func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) }) return sumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)}func newZeroCostVer2(trace bool) (ret costVer2) { if trace { ret.trace = &costTrace{make(map[string]float64), ""} } return}
简略的看一下代码,咱们能够发现,从大多数的场景来看,依照代价从小到大来排,这几种Join是MergeJoin<HashJoin<IndexJoin。
5.4.2执行打算枚举与择优
总得来说这块代码较为简单,实质就是枚举所有符合条件的物理执行打算,并挑选出代价最小的执行打算,故不再列举代码。有趣味的同学能够依据以下纲要自行翻阅:
|--planner/core/find_best_task.go\-- findBestTask\-- enumeratePhysicalPlans4Task\-- compareTaskCost\-- getTaskPlanCost|-- planner/core/plan_cost_ver2.go\-- getPlanCost
6.其余
6.1 参考与援用的文章
- Cascades Optimizer:https://zhuanlan.zhihu.com/p/73545345
- 揭秘 TiDB 新优化器:Cascades Planner 原理解析:https://cn.pingcap.com/blog/tidb-cascades-planner/
- TiDB文档-统计信息简介:https://docs.pingcap.com/zh/tidb/v6.5/statistics#%E7%BB%9F%E8...
- TiDB文档-谬误索引的解决方案:https://docs.pingcap.com/zh/tidb/v6.5/wrong-index-solution#%E...
- The Volcano Optimizer Generator: Extensibility and Efficient Search:https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimiz...
- The Cascades Framework for Query Optimization:https://15721.courses.cs.cmu.edu/spring2018/papers/15-optimiz...
6.2 常识补充:code generation && vectorized execution
数据库引擎执行器中十分闻名的两种优化形式,code generation和 vectorized execution。
code generation次要是依据上下文来生成一整段优化过的代码,这与那种嵌套大量if...else、for循环、虚办法的代码齐全相同,齐全面向性能思考。
vectorized execution基于拉模型。相比于一次拉一个tuple来说,它的批量拉取缩小了屡次拉取的开销,同时还能够应用到SIMD。基于这种场景,vectorized execution的优化更加实用于列式数据库。