关于tidb:读TiDB源码聊设计浅析HTAP的SQL优化器

版本	日期	备注
1.0	2024.2.18	文章首发

本文的的源码剖析全副基于TiDB6.5来做剖析。

1.引子

如果让你做一个分布式数据库的优化器，面对以下的SQL，你会想到什么好的办法去执行他们呢？

SELECT id, name FROM person WHERE age >= 18 or height > 180 limit 100;：从条件上看，咱们看到条件其实是二选一的： age >= 18 or height > 180。基于这种状况，咱们必定会去抉择有索引的数据，如果都有索引or都没有，那么必定抉择扫描行数起码的数据。如果有一些算子在外面的话，则额定须要思考数据的就近准则——有些算子在局部架构下能够充分利用MPP的能力，而有些则不行。
SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders LEFT JOIN customers ON orders.customer_id=customers.customer_id;分布式数据库中的join，最优的形式就是小表播送到大表数据所在的中央。那么首先咱们得晓得谁是小表，谁是大表。

2.统计信息收集

依据后面的两个例子，咱们能够发现——如果咱们须要找到SQL对应的最佳打算，咱们会须要一些表的元数据，或者说是统计信息。从惯例的角度来说，以下统计信息是必须收集的：

表的总行数
每列数据的均匀大小
每列数据的基数：即NDV（distinct value）
列的NULL值个数

如果是事务型的（行式存储），那么还要思考索引均匀长度、值的散布等等。

如果是剖析型的（列式存储），那么还须要思考相干列的最大值、最小值等等。

而统计形式的收集，会有多种形式。次要是思考资源和准确性之间的Trade off。常见的有：

TopN：相干数据呈现次数前 n 的值。
直方图：用于形容数据分布图的工具。依照数据的值大小进行分桶，并用一些简略的数据来形容每个桶，比方落在桶里的值的个数。
2D直方图：因为直方图无奈反馈列之间的关联，能够用2D直方图（联结散布）做到，但空间占用也比拟多。
Count-Min Sketch：相似哈希表加上计算器的实现。能够用很少的数据来形容整体数据的个性。
HyperLogLog：一种估算海量数据基数的办法。很多状况下，统计并不需要那么准确。工程方面要在应用资源和准确性里找均衡。因而有人提出用HLL，这是一种占用资源少，但能给出较为精确的近似后果的算法。

TiDB收集的统计信息见：https://docs.pingcap.com/zh/tidb/v6.5/statistics#%E7%9B%B4%E6...

3.代价的评估

一个SQL真正的物理执行打算可能是有多个的。在以统计信息为根底之上，咱们还须要退出相应的权重，举个例子：

如果可能在内存中计算实现，就不必去重复发动本地IO
如果能在本地IO中实现，就不要去发动网络申请

等等...

这在TiDB的代码中也有对应的默认值。

DefOptCPUFactor                                = 3.0DefOptCopCPUFactor                             = 3.0DefOptTiFlashConcurrencyFactor                 = 24.0DefOptNetworkFactor                            = 1.0DefOptScanFactor                               = 1.5DefOptDescScanFactor                           = 3.0DefOptSeekFactor                               = 20.0DefOptMemoryFactor                             = 0.001DefOptDiskFactor                               = 1.5DefOptConcurrencyFactor                        = 3.0

var defaultVer2Factors = costVer2Factors{    TiDBTemp:      costVer2Factor{"tidb_temp_table_factor", 0.00},    TiKVScan:      costVer2Factor{"tikv_scan_factor", 40.70},    TiKVDescScan:  costVer2Factor{"tikv_desc_scan_factor", 61.05},    TiFlashScan:   costVer2Factor{"tiflash_scan_factor", 11.60},    TiDBCPU:       costVer2Factor{"tidb_cpu_factor", 49.90},    TiKVCPU:       costVer2Factor{"tikv_cpu_factor", 49.90},    TiFlashCPU:    costVer2Factor{"tiflash_cpu_factor", 2.40},    TiDB2KVNet:    costVer2Factor{"tidb_kv_net_factor", 3.96},    TiDB2FlashNet: costVer2Factor{"tidb_flash_net_factor", 2.20},    TiFlashMPPNet: costVer2Factor{"tiflash_mpp_net_factor", 1.00},    TiDBMem:       costVer2Factor{"tidb_mem_factor", 0.20},    TiKVMem:       costVer2Factor{"tikv_mem_factor", 0.20},    TiFlashMem:    costVer2Factor{"tiflash_mem_factor", 0.05},    TiDBDisk:      costVer2Factor{"tidb_disk_factor", 200.00},    TiDBRequest:   costVer2Factor{"tidb_request_factor", 6000000.00},}

4.执行打算枚举与择优

当咱们能够评估出物理执行打算的代价时，将会枚举所有能够用上物理执行打算，并在其中抉择代价最小的物理执行打算。个别枚举分为两个流派：

自底向上：代表有System R。
自顶向下：代表有Cascade。

自底向上没法解决一个问题。当下层对上层的输入后果程序感兴趣时，那就不能只能从底层的视角来寻找部分最优。

举个例子，多表Join。一开始两个表Join可能HashJoin代价很低，然而Join下一个表时，用MergeJoin从整体来看代价更低。从这个case来看，自底向上来做打算取优并不适合。

所以有了Cascade：

搜寻计划是自顶向下的。这意味着它能够防止部分最优而导致全局不优。从Operator Tree 自顶向下遍历时，能够做一系列变换：

- Implementation：逻辑算子能够转换成物理算子；例如Join转换成NestLoop或者HashJoin等
- Exploration：逻辑算子能够做等价变换；例如替换Inner Join的两个子节点，即可枚举Join程序

图片来自于：Cascades Optimizer——https://zhuanlan.zhihu.com/p/73545345

它是基于Volcano模型演进而来的。
用面向对象的形式进行建模，编写规定时不须要关怀搜寻过程。相比传统优化器中面向过程去一条条的编写，确实是很大的改良。

5.TiDB的实现

大抵的代码调用链为：

-- session/session.go\-- ExecuteStmt //SQL执行的入口|-- executor/compiler.go\-- Compile //将SQL变成可执行的打算|--planner/planner/optmize.go\-- Optimize //优化的入口\-- optimize //这里有两个入口。一种是新的优化器入口，一种是老的优化器入口。依据配置来抉择。最终都会返回最优的物理执行打算。    |-- planner/cascades/optmize.go        \--FindBestPlan 见5.1            \-- onPhasePreprocessing //见5.3        \-- implGroup        |--planner/core/optmizer.go //见5.4            \-- DoOptimize            \-- physicalOptimize            |--planner/core/find_best_task.go            \-- findBestTask            \-- enumeratePhysicalPlans4Task            \-- compareTaskCost                \-- getTaskPlanCost                |-- planner/core/plan_cost_ver2.go            \-- getPlanCost

5.1 逻辑优化

外围入口为：

// FindBestPlan is the optimization entrance of the cascades planner. The// optimization is composed of 3 phases: preprocessing, exploration and implementation.//// ------------------------------------------------------------------------------// Phase 1: Preprocessing// ------------------------------------------------------------------------------//// The target of this phase is to preprocess the plan tree by some heuristic// rules which should always be beneficial, for example Column Pruning.//// ------------------------------------------------------------------------------// Phase 2: Exploration// ------------------------------------------------------------------------------//// The target of this phase is to explore all the logically equivalent// expressions by exploring all the equivalent group expressions of each group.//// At the very beginning, there is only one group expression in a Group. After// applying some transformation rules on certain expressions of the Group, all// the equivalent expressions are found and stored in the Group. This procedure// can be regarded as searching for a weak connected component in a directed// graph, where nodes are expressions and directed edges are the transformation// rules.//// ------------------------------------------------------------------------------// Phase 3: Implementation// ------------------------------------------------------------------------------//// The target of this phase is to search the best physical plan for a Group// which satisfies a certain required physical property.//// In this phase, we need to enumerate all the applicable implementation rules// for each expression in each group under the required physical property. A// memo structure is used for a group to reduce the repeated search on the same// required physical property.func (opt *Optimizer) FindBestPlan(sctx sessionctx.Context, logical plannercore.LogicalPlan) (p plannercore.PhysicalPlan, cost float64, err error) {    logical, err = opt.onPhasePreprocessing(sctx, logical)    if err != nil {        return nil, 0, err    }    rootGroup := memo.Convert2Group(logical)    err = opt.onPhaseExploration(sctx, rootGroup)    if err != nil {        return nil, 0, err    }    p, cost, err = opt.onPhaseImplementation(sctx, rootGroup)    if err != nil {        return nil, 0, err    }    err = p.ResolveIndices()    return p, cost, err}

正文+代码很洁净，这里一共做三件事

onPhasePreprocessing：正文很切实，说for example Column Pruning，后果外面真的只做了列裁剪。
onPhaseExploration：摸索所有逻辑等价存在的可能
onPhaseImplementation：依据代价寻找最优的物理执行打算

这块官网的博客写的十分好，我就不反复了：https://cn.pingcap.com/blog/tidb-cascades-planner/

5.2 统计信息收集

这块代码次要集中在stats.go里，外面的外围数据结构是stats_info.go里的StatsInfo。调用链大抵为：

|-- planner/cascades/optimizer.go\--fillGroupStats|-- planner/core/stats.go\--DeriveStats

type LogicalPlan interface {    Plan    //......疏忽一些代码    // DeriveStats derives statistic info for current plan node given child stats.    // We need selfSchema, childSchema here because it makes this method can be used in    // cascades planner, where LogicalPlan might not record its children or schema.    DeriveStats(childStats []*property.StatsInfo, selfSchema *expression.Schema, childSchema []*expression.Schema, colGroups [][]*expression.Column) (*property.StatsInfo, error)    //......疏忽一些代码}

有很多构造体实现了这个办法：

收集统计信息次要是analyze.go#Next办法中调用的#analyzeWorker。

5.3 新版本物理执行打算的抉择

代码入口为：

// implGroup finds the best Implementation which satisfies the required// physical property for a Group. The best Implementation should have the// lowest cost among all the applicable Implementations.//// g:            the Group to be implemented.// reqPhysProp: the required physical property.// costLimit:   the maximum cost of all the Implementations.func (opt *Optimizer) implGroup(g *memo.Group, reqPhysProp *property.PhysicalProperty, costLimit float64) (memo.Implementation, error) {    groupImpl := g.GetImpl(reqPhysProp)    if groupImpl != nil {        if groupImpl.GetCost() <= costLimit {            return groupImpl, nil        }        return nil, nil    }    // Handle implementation rules for each equivalent GroupExpr.    var childImpls []memo.Implementation    err := opt.fillGroupStats(g)    if err != nil {        return nil, err    }    outCount := math.Min(g.Prop.Stats.RowCount, reqPhysProp.ExpectedCnt)    for elem := g.Equivalents.Front(); elem != nil; elem = elem.Next() {        curExpr := elem.Value.(*memo.GroupExpr)        impls, err := opt.implGroupExpr(curExpr, reqPhysProp)        if err != nil {            return nil, err        }        for _, impl := range impls {            childImpls = childImpls[:0]            for i, childGroup := range curExpr.Children {                childImpl, err := opt.implGroup(childGroup, impl.GetPlan().GetChildReqProps(i), impl.GetCostLimit(costLimit, childImpls...))                if err != nil {                    return nil, err                }                if childImpl == nil {                    impl.SetCost(math.MaxFloat64)                    break                }                childImpls = append(childImpls, childImpl)            }            if impl.GetCost() == math.MaxFloat64 {                continue            }            implCost := impl.CalcCost(outCount, childImpls...)            if implCost > costLimit {                continue            }            if groupImpl == nil || groupImpl.GetCost() > implCost {                groupImpl = impl.AttachChildren(childImpls...)                costLimit = implCost            }        }    }    // Handle enforcer rules for required physical property.    for _, rule := range GetEnforcerRules(g, reqPhysProp) {        newReqPhysProp := rule.NewProperty(reqPhysProp)        enforceCost := rule.GetEnforceCost(g)        childImpl, err := opt.implGroup(g, newReqPhysProp, costLimit-enforceCost)        if err != nil {            return nil, err        }        if childImpl == nil {            continue        }        impl := rule.OnEnforce(reqPhysProp, childImpl)        implCost := enforceCost + childImpl.GetCost()        impl.SetCost(implCost)        if groupImpl == nil || groupImpl.GetCost() > implCost {            groupImpl = impl            costLimit = implCost        }    }    if groupImpl == nil || groupImpl.GetCost() == math.MaxFloat64 {        return nil, nil    }    g.InsertImpl(reqPhysProp, groupImpl)    return groupImpl, nil}

这里个函数会找出潜在符合条件的物理执行打算，并一直的搜寻。但它有一个剪枝逻辑——会记录以后最小的cost，如果一个执行打算自上向下搜寻时，如果超过了这个cost，则间接返回。这就是在第3节提到的自顶向下的优化。

接下来咱们看一下memo.Implementation的定义：

package memoimport (    plannercore "github.com/pingcap/tidb/planner/core")// Implementation defines the interface for cost of physical plan.type Implementation interface {    CalcCost(outCount float64, children ...Implementation) float64    SetCost(cost float64)    GetCost() float64    GetPlan() plannercore.PhysicalPlan    // AttachChildren is used to attach children implementations and returns it self.    AttachChildren(children ...Implementation) Implementation    // GetCostLimit gets the costLimit for implementing the next childGroup.    GetCostLimit(costLimit float64, children ...Implementation) float64}

其中CalcCost办法就是用来计算物理执行打算的代价。一共有这么多构造体实现了它：

5.3.1 代价的评估

咱们以结尾的例子，解说代价的评估。

select代价计算形式

// CalcCost calculates the cost of the table scan Implementation.func (impl *TableScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {    ts := impl.plan.(*plannercore.PhysicalTableScan)    width := impl.tblColHists.GetTableAvgRowSize(impl.plan.SCtx(), impl.tblCols, kv.TiKV, true)    sessVars := ts.SCtx().GetSessionVars()    impl.cost = outCount * sessVars.GetScanFactor(ts.Table) * width    if ts.Desc {        impl.cost = outCount * sessVars.GetDescScanFactor(ts.Table) * width    }    return impl.cost}// GetScanFactor returns the session variable scanFactor// returns 0 when tbl is a temporary table.func (s *SessionVars) GetScanFactor(tbl *model.TableInfo) float64 {    if tbl != nil {        if tbl.TempTableType != model.TempTableNone {            return 0        }    }    return s.scanFactor }// CalcCost implements Implementation interface.func (impl *IndexScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {    is := impl.plan.(*plannercore.PhysicalIndexScan)    sessVars := is.SCtx().GetSessionVars()    rowSize := impl.tblColHists.GetIndexAvgRowSize(is.SCtx(), is.Schema().Columns, is.Index.Unique)    cost := outCount * rowSize * sessVars.GetScanFactor(is.Table)    if is.Desc {        cost = outCount * rowSize * sessVars.GetDescScanFactor(is.Table)    }    cost += float64(len(is.Ranges)) * sessVars.GetSeekFactor(is.Table)    impl.cost = cost    return impl.cost}

这里咱们以全表扫描表和命中索引的代码为例子。其中默认的scanFactor是1.5。这里能够看到indexScan和tableScan少了一个因数：width。这个变量代表了所需列的均匀大小。这么看基本上是indexScan最优了。

这里的代码笔者认为有点不优雅——当Desc时，其实之前的Cost是没必要算一遍的，节约CPU资源。

join代价计算形式

// CalcCost implements Implementation CalcCost interface.func (impl *HashJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {    hashJoin := impl.plan.(*plannercore.PhysicalHashJoin)    // The children here are only used to calculate the cost.    hashJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())    selfCost := hashJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), false, 0, nil)    impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()    return impl.cost}// CalcCost implements Implementation CalcCost interface.func (impl *MergeJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {    mergeJoin := impl.plan.(*plannercore.PhysicalMergeJoin)    // The children here are only used to calculate the cost.    mergeJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())    selfCost := mergeJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), 0)    impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()    return impl.cost}

具体的计算都在plan_cost_v1.go里：

// GetCost computes cost of hash join operator itself.func (p *PhysicalHashJoin) GetCost(lCnt, rCnt float64, isMPP bool, costFlag uint64, op *physicalOptimizeOp) float64 {    buildCnt, probeCnt := lCnt, rCnt    build := p.children[0]    // Taking the right as the inner for right join or using the outer to build a hash table.    if (p.InnerChildIdx == 1 && !p.UseOuterToBuild) || (p.InnerChildIdx == 0 && p.UseOuterToBuild) {        buildCnt, probeCnt = rCnt, lCnt        build = p.children[1]    }    sessVars := p.ctx.GetSessionVars()    oomUseTmpStorage := variable.EnableTmpStorageOnOOM.Load()    memQuota := sessVars.MemTracker.GetBytesLimit() // sessVars.MemQuotaQuery && hint    rowSize := getAvgRowSize(build.statsInfo(), build.Schema().Columns)    spill := oomUseTmpStorage && memQuota > 0 && rowSize*buildCnt > float64(memQuota) && p.storeTp != kv.TiFlash    // Cost of building hash table.    cpuFactor := sessVars.GetCPUFactor()    diskFactor := sessVars.GetDiskFactor()    memoryFactor := sessVars.GetMemoryFactor()    concurrencyFactor := sessVars.GetConcurrencyFactor()    cpuCost := buildCnt * cpuFactor    memoryCost := buildCnt * memoryFactor    diskCost := buildCnt * diskFactor * rowSize    // Number of matched row pairs regarding the equal join conditions.    helper := &fullJoinRowCountHelper{        sctx:            p.SCtx(),        cartesian:       false,        leftProfile:     p.children[0].statsInfo(),        rightProfile:    p.children[1].statsInfo(),        leftJoinKeys:    p.LeftJoinKeys,        rightJoinKeys:   p.RightJoinKeys,        leftSchema:      p.children[0].Schema(),        rightSchema:     p.children[1].Schema(),        leftNAJoinKeys:  p.LeftNAJoinKeys,        rightNAJoinKeys: p.RightNAJoinKeys,    }    numPairs := helper.estimate()    // For semi-join class, if `OtherConditions` is empty, we already know    // the join results after querying hash table, otherwise, we have to    // evaluate those resulted row pairs after querying hash table; if we    // find one pair satisfying the `OtherConditions`, we then know the    // join result for this given outer row, otherwise we have to iterate    // to the end of those pairs; since we have no idea about when we can    // terminate the iteration, we assume that we need to iterate half of    // those pairs in average.    if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||        p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {        if len(p.OtherConditions) > 0 {            numPairs *= 0.5        } else {            numPairs = 0        }    }    if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {        numPairs = getOperatorActRows(p)    }    // Cost of querying hash table is cheap actually, so we just compute the cost of    // evaluating `OtherConditions` and joining row pairs.    probeCost := numPairs * cpuFactor    probeDiskCost := numPairs * diskFactor * rowSize    // Cost of evaluating outer filter.    if len(p.LeftConditions)+len(p.RightConditions) > 0 {        // Input outer count for the above compution should be adjusted by SelectionFactor.        probeCost *= SelectionFactor        probeDiskCost *= SelectionFactor        probeCost += probeCnt * cpuFactor    }    diskCost += probeDiskCost    probeCost /= float64(p.Concurrency)    // Cost of additional concurrent goroutines.    cpuCost += probeCost + float64(p.Concurrency+1)*concurrencyFactor    // Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table    if p.UseOuterToBuild {        if spill {            // It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk            cpuCost += buildCnt * cpuFactor        } else {            cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)        }        diskCost += buildCnt * diskFactor * rowSize    }    if spill {        memoryCost *= float64(memQuota) / (rowSize * buildCnt)    } else {        diskCost = 0    }    if op != nil {        setPhysicalHashJoinCostDetail(p, op, spill, buildCnt, probeCnt, cpuFactor, rowSize, numPairs,            cpuCost, probeCost, memoryCost, diskCost, probeDiskCost,            diskFactor, memoryFactor, concurrencyFactor,            memQuota)    }    return cpuCost + memoryCost + diskCost}// GetCost computes cost of merge join operator itself.func (p *PhysicalMergeJoin) GetCost(lCnt, rCnt float64, costFlag uint64) float64 {    outerCnt := lCnt    innerCnt := rCnt    innerKeys := p.RightJoinKeys    innerSchema := p.children[1].Schema()    innerStats := p.children[1].statsInfo()    if p.JoinType == RightOuterJoin {        outerCnt = rCnt        innerCnt = lCnt        innerKeys = p.LeftJoinKeys        innerSchema = p.children[0].Schema()        innerStats = p.children[0].statsInfo()    }    helper := &fullJoinRowCountHelper{        sctx:          p.SCtx(),        cartesian:     false,        leftProfile:   p.children[0].statsInfo(),        rightProfile:  p.children[1].statsInfo(),        leftJoinKeys:  p.LeftJoinKeys,        rightJoinKeys: p.RightJoinKeys,        leftSchema:    p.children[0].Schema(),        rightSchema:   p.children[1].Schema(),    }    numPairs := helper.estimate()    if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||        p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {        if len(p.OtherConditions) > 0 {            numPairs *= 0.5        } else {            numPairs = 0        }    }    if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {        numPairs = getOperatorActRows(p)    }    sessVars := p.ctx.GetSessionVars()    probeCost := numPairs * sessVars.GetCPUFactor()    // Cost of evaluating outer filters.    var cpuCost float64    if len(p.LeftConditions)+len(p.RightConditions) > 0 {        probeCost *= SelectionFactor        cpuCost += outerCnt * sessVars.GetCPUFactor()    }    cpuCost += probeCost    // For merge join, only one group of rows with same join key(not null) are cached,    // we compute average memory cost using estimated group size.    NDV, _ := getColsNDVWithMatchedLen(innerKeys, innerSchema, innerStats)    memoryCost := (innerCnt / NDV) * sessVars.GetMemoryFactor()    return cpuCost + memoryCost}

HashJoin要思考到内存不够的状况，因而在计算到数据不够时，会将对应的数据压入硬盘。但它对数据的程序并无要求，因而能够并发的去做。见：

    // Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table    if p.UseOuterToBuild {        if spill {            // It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk            cpuCost += buildCnt * cpuFactor        } else {            cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)        }        diskCost += buildCnt * diskFactor * rowSize    }

而MergeJoin的代价显然会更小，但可能选则到这个打算也有较高的要求：当两个关联表要 Join 的字段须要按排好的程序读取时，实用 Merge Join 算法。

5.4 老版本物理执行打算的抉择

5.4.1 代价的评估

这块代码次要是在plan_cost_ver1.go和plan_cost_ver2.go。v2对代价公式进行了更准确的回归校准，调整了局部代价公式，比此前版本的代价公式更加精确。代码上也更为简洁：v2只暴露出了一个公共办法，外部通过不同的类型做转发。

// GetPlanCost returns the cost of this plan.func GetPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {    return getPlanCost(p, taskType, option)}func getPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {    if p.SCtx().GetSessionVars().CostModelVersion == modelVer2 {        planCost, err := p.getPlanCostVer2(taskType, option)        return planCost.cost, err    }    return p.getPlanCostVer1(taskType, option)}

依据不同的PhysicalPlan类型，会找到不同绑定办法：

v1的局部办法展现：

select代价计算形式

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:// plan-cost = child-cost + filter-costfunc (p *PhysicalSelection) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {    if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {        return p.planCostVer2, nil    }    inputRows := getCardinality(p.children[0], option.CostFlag)    cpuFactor := getTaskCPUFactorVer2(p, taskType)    filterCost := filterCostVer2(option, inputRows, p.Conditions, cpuFactor)    childCost, err := p.children[0].getPlanCostVer2(taskType, option)    if err != nil {        return zeroCostVer2, err    }    p.planCostVer2 = sumCostVer2(filterCost, childCost)    p.planCostInit = true    return p.planCostVer2, nil}

这部分代码简略易读。代价就是子查问的代价+筛选的代价。

那么问题来了，中索引的和不中索引的代价应该是不一样的。这里没有体现进去啊。认真看childCost, err := p.children[0].getPlanCostVer2(taskType, option)，这里是会去获取子物理执行打算的代价。

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:func (p *PointGetPlan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {    if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {        return p.planCostVer2, nil    }    if p.accessCols == nil { // from fast plan code path        p.planCostVer2 = zeroCostVer2        p.planCostInit = true        return zeroCostVer2, nil    }    rowSize := getAvgRowSize(p.stats, p.schema.Columns)    netFactor := getTaskNetFactorVer2(p, taskType)    p.planCostVer2 = netCostVer2(option, 1, rowSize, netFactor)    p.planCostInit = true    return p.planCostVer2, nil}func netCostVer2(option *PlanCostOption, rows, rowSize float64, netFactor costVer2Factor) costVer2 {    return newCostVer2(option, netFactor,        rows*rowSize*netFactor.Value,        func() string { return fmt.Sprintf("net(%v*rowsize(%v)*%v)", rows, rowSize, netFactor) })}// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:// plan-cost = rows * log2(row-size) * scan-factor// log2(row-size) is from experiments.func (p *PhysicalTableScan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {    if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {        return p.planCostVer2, nil    }    rows := getCardinality(p, option.CostFlag)    var rowSize float64    if p.StoreType == kv.TiKV {        rowSize = getAvgRowSize(p.stats, p.tblCols) // consider all columns if TiKV    } else { // TiFlash        rowSize = getAvgRowSize(p.stats, p.schema.Columns)    }    rowSize = math.Max(rowSize, 2.0)    scanFactor := getTaskScanFactorVer2(p, p.StoreType, taskType)    p.planCostVer2 = scanCostVer2(option, rows, rowSize, scanFactor)    // give TiFlash a start-up cost to let the optimizer prefers to use TiKV to process small table scans.    if p.StoreType == kv.TiFlash {        p.planCostVer2 = sumCostVer2(p.planCostVer2, scanCostVer2(option, 10000, rowSize, scanFactor))    }    p.planCostInit = true    return p.planCostVer2, nil}func scanCostVer2(option *PlanCostOption, rows, rowSize float64, scanFactor costVer2Factor) costVer2 {    if rowSize < 1 {        rowSize = 1    }    return newCostVer2(option, scanFactor,        // rows * log(row-size) * scanFactor, log2 from experiments        rows*math.Log2(rowSize)*scanFactor.Value,        func() string { return fmt.Sprintf("scan(%v*logrowsize(%v)*%v)", rows, rowSize, scanFactor) })}

scanFactor的代价默认是40.7，netFactor的代价默认是3.96。联合代码来看，命中索引的代价更优。

join代价计算形式

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:// plan-cost = build-child-cost + build-filter-cost +// (probe-cost + probe-filter-cost) / concurrency// probe-cost = probe-child-cost * build-rows / batchRatiofunc (p *PhysicalIndexJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {    return p.getIndexJoinCostVer2(taskType, option, 0)}func (p *PhysicalIndexHashJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {    return p.getIndexJoinCostVer2(taskType, option, 1)}func (p *PhysicalIndexMergeJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {    return p.getIndexJoinCostVer2(taskType, option, 2)}func (p *PhysicalIndexJoin) getIndexJoinCostVer2(taskType property.TaskType, option *PlanCostOption, indexJoinType int) (costVer2, error) {    if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {        return p.planCostVer2, nil    }    build, probe := p.children[1-p.InnerChildIdx], p.children[p.InnerChildIdx]    buildRows := getCardinality(build, option.CostFlag)    buildRowSize := getAvgRowSize(build.Stats(), build.Schema().Columns)    probeRowsOne := getCardinality(probe, option.CostFlag)    probeRowsTot := probeRowsOne * buildRows    probeRowSize := getAvgRowSize(probe.Stats(), probe.Schema().Columns)    buildFilters, probeFilters := p.LeftConditions, p.RightConditions    probeConcurrency := float64(p.ctx.GetSessionVars().IndexLookupJoinConcurrency())    cpuFactor := getTaskCPUFactorVer2(p, taskType)    memFactor := getTaskMemFactorVer2(p, taskType)    requestFactor := getTaskRequestFactorVer2(p, taskType)    buildFilterCost := filterCostVer2(option, buildRows, buildFilters, cpuFactor)    buildChildCost, err := build.getPlanCostVer2(taskType, option)    if err != nil {        return zeroCostVer2, err    }    buildTaskCost := newCostVer2(option, cpuFactor,        buildRows*10*cpuFactor.Value,        func() string { return fmt.Sprintf("cpu(%v*10*%v)", buildRows, cpuFactor) })    startCost := newCostVer2(option, cpuFactor,        10*3*cpuFactor.Value,        func() string { return fmt.Sprintf("cpu(10*3*%v)", cpuFactor) })    probeFilterCost := filterCostVer2(option, probeRowsTot, probeFilters, cpuFactor)    probeChildCost, err := probe.getPlanCostVer2(taskType, option)    if err != nil {        return zeroCostVer2, err    }    var hashTableCost costVer2    switch indexJoinType {    case 1: // IndexHashJoin        hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)    case 2: // IndexMergeJoin        hashTableCost = newZeroCostVer2(traceCost(option))    default: // IndexJoin        hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)    }    // IndexJoin executes a batch of rows at a time, so the actual cost of this part should be    //  `innerCostPerBatch * numberOfBatches` instead of `innerCostPerRow * numberOfOuterRow`.    // Use an empirical value batchRatio to handle this now.    // TODO: remove this empirical value.    batchRatio := 6.0    probeCost := divCostVer2(mulCostVer2(probeChildCost, buildRows), batchRatio)    // Double Read Cost    doubleReadCost := newZeroCostVer2(traceCost(option))    if p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate > 0 {        batchSize := float64(p.ctx.GetSessionVars().IndexJoinBatchSize)        taskPerBatch := 1024.0 // TODO: remove this magic number        doubleReadTasks := buildRows / batchSize * taskPerBatch        doubleReadCost = doubleReadCostVer2(option, doubleReadTasks, requestFactor)        doubleReadCost = mulCostVer2(doubleReadCost, p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate)    }    p.planCostVer2 = sumCostVer2(startCost, buildChildCost, buildFilterCost, buildTaskCost, divCostVer2(sumCostVer2(doubleReadCost, probeCost, probeFilterCost, hashTableCost), probeConcurrency))    p.planCostInit = true    return p.planCostVer2, nil}

关键在于：

    switch indexJoinType {    case 1: // IndexHashJoin        hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)    case 2: // IndexMergeJoin        hashTableCost = newZeroCostVer2(traceCost(option))    default: // IndexJoin        hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)    }

对应办法：

func hashBuildCostVer2(option *PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costVer2Factor) costVer2 {    // TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table    hashKeyCost := newCostVer2(option, cpuFactor,        buildRows*nKeys*cpuFactor.Value,        func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) })    hashMemCost := newCostVer2(option, memFactor,        buildRows*buildRowSize*memFactor.Value,        func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })    hashBuildCost := newCostVer2(option, cpuFactor,        buildRows*cpuFactor.Value,        func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) })    return sumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)}func newZeroCostVer2(trace bool) (ret costVer2) {    if trace {        ret.trace = &costTrace{make(map[string]float64), ""}    }    return}

简略的看一下代码，咱们能够发现，从大多数的场景来看，依照代价从小到大来排，这几种Join是MergeJoin<HashJoin<IndexJoin。

5.4.2执行打算枚举与择优

总得来说这块代码较为简单，实质就是枚举所有符合条件的物理执行打算，并挑选出代价最小的执行打算，故不再列举代码。有趣味的同学能够依据以下纲要自行翻阅：

|--planner/core/find_best_task.go\-- findBestTask\-- enumeratePhysicalPlans4Task\-- compareTaskCost\-- getTaskPlanCost|-- planner/core/plan_cost_ver2.go\-- getPlanCost

6.其余

6.1 参考与援用的文章

Cascades Optimizer：https://zhuanlan.zhihu.com/p/73545345
揭秘 TiDB 新优化器：Cascades Planner 原理解析：https://cn.pingcap.com/blog/tidb-cascades-planner/
TiDB文档-统计信息简介：https://docs.pingcap.com/zh/tidb/v6.5/statistics#%E7%BB%9F%E8...
TiDB文档-谬误索引的解决方案：https://docs.pingcap.com/zh/tidb/v6.5/wrong-index-solution#%E...
The Volcano Optimizer Generator: Extensibility and Efficient Search：https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimiz...
The Cascades Framework for Query Optimization：https://15721.courses.cs.cmu.edu/spring2018/papers/15-optimiz...

6.2 常识补充：code generation && vectorized execution

数据库引擎执行器中十分闻名的两种优化形式，code generation和 vectorized execution。

code generation次要是依据上下文来生成一整段优化过的代码，这与那种嵌套大量if...else、for循环、虚办法的代码齐全相同，齐全面向性能思考。

vectorized execution基于拉模型。相比于一次拉一个tuple来说，它的批量拉取缩小了屡次拉取的开销，同时还能够应用到SIMD。基于这种场景，vectorized execution的优化更加实用于列式数据库。

1.引子

2.统计信息收集

3.代价的评估

4.执行打算枚举与择优

5.TiDB的实现

5.1 逻辑优化

5.2 统计信息收集

5.3 新版本 物理执行打算的抉择

5.3.1 代价的评估

select代价计算形式

join代价计算形式

5.4 老版本 物理执行打算的抉择

5.4.1 代价的评估

select代价计算形式

join代价计算形式

5.4.2执行打算枚举与择优

6.其余

6.1 参考与援用的文章

6.2 常识补充：code generation && vectorized execution

5.3 新版本物理执行打算的抉择

5.4 老版本物理执行打算的抉择