关于tidb:读TiDB源码聊设计浅析HTAP的SQL优化器

版本	日期	备注
1.0	2024.2.18	文章首发

本文的的源码剖析全副基于 TiDB6.5 来做剖析。

如果让你做一个分布式数据库的优化器，面对以下的 SQL，你会想到什么好的办法去执行他们呢？

SELECT id, name FROM person WHERE age >= 18 or height > 180 limit 100;：从条件上看，咱们看到条件其实是二选一的：age >= 18 or height > 180。基于这种状况，咱们必定会去抉择有索引的数据，如果都有索引 or 都没有，那么必定抉择扫描行数起码的数据。如果有一些算子在外面的话，则额定须要思考数据的就近准则——有些算子在局部架构下能够充分利用 MPP 的能力，而有些则不行。
SELECT orders.order_id, customers.customer_name, orders.order_date FROM orders LEFT JOIN customers ON orders.customer_id=customers.customer_id;分布式数据库中的 join，最优的形式就是小表播送到大表数据所在的中央。那么首先咱们得晓得谁是小表，谁是大表。

依据后面的两个例子，咱们能够发现——如果咱们须要找到 SQL 对应的最佳打算，咱们会须要一些表的元数据，或者说是统计信息。从惯例的角度来说，以下统计信息是必须收集的：

表的总行数
每列数据的均匀大小
每列数据的基数：即 NDV（distinct value）
列的 NULL 值个数

如果是事务型的（行式存储），那么还要思考索引均匀长度、值的散布等等。

如果是剖析型的（列式存储），那么还须要思考相干列的最大值、最小值等等。

而统计形式的收集，会有多种形式。次要是思考资源和准确性之间的 Trade off。常见的有：

TopN：相干数据呈现次数前 n 的值。
直方图：用于形容数据分布图的工具。依照数据的值大小进行分桶，并用一些简略的数据来形容每个桶，比方落在桶里的值的个数。
2D 直方图：因为直方图无奈反馈列之间的关联，能够用 2D 直方图（联结散布）做到，但空间占用也比拟多。
Count-Min Sketch：相似哈希表加上计算器的实现。能够用很少的数据来形容整体数据的个性。
HyperLogLog：一种估算海量数据基数的办法。很多状况下，统计并不需要那么准确。工程方面要在应用资源和准确性里找均衡。因而有人提出用 HLL，这是一种占用资源少，但能给出较为精确的近似后果的算法。

TiDB 收集的统计信息见：https://docs.pingcap.com/zh/tidb/v6.5/statistics#%E7%9B%B4%E6…

一个 SQL 真正的物理执行打算可能是有多个的。在以统计信息为根底之上，咱们还须要退出相应的权重，举个例子：

如果可能在内存中计算实现，就不必去重复发动本地 IO
如果能在本地 IO 中实现，就不要去发动网络申请

等等 …

这在 TiDB 的代码中也有对应的默认值。

DefOptCPUFactor                                = 3.0
DefOptCopCPUFactor                             = 3.0
DefOptTiFlashConcurrencyFactor                 = 24.0
DefOptNetworkFactor                            = 1.0
DefOptScanFactor                               = 1.5
DefOptDescScanFactor                           = 3.0
DefOptSeekFactor                               = 20.0
DefOptMemoryFactor                             = 0.001
DefOptDiskFactor                               = 1.5
DefOptConcurrencyFactor                        = 3.0

var defaultVer2Factors = costVer2Factors{TiDBTemp:      costVer2Factor{"tidb_temp_table_factor", 0.00},
    TiKVScan:      costVer2Factor{"tikv_scan_factor", 40.70},
    TiKVDescScan:  costVer2Factor{"tikv_desc_scan_factor", 61.05},
    TiFlashScan:   costVer2Factor{"tiflash_scan_factor", 11.60},
    TiDBCPU:       costVer2Factor{"tidb_cpu_factor", 49.90},
    TiKVCPU:       costVer2Factor{"tikv_cpu_factor", 49.90},
    TiFlashCPU:    costVer2Factor{"tiflash_cpu_factor", 2.40},
    TiDB2KVNet:    costVer2Factor{"tidb_kv_net_factor", 3.96},
    TiDB2FlashNet: costVer2Factor{"tidb_flash_net_factor", 2.20},
    TiFlashMPPNet: costVer2Factor{"tiflash_mpp_net_factor", 1.00},
    TiDBMem:       costVer2Factor{"tidb_mem_factor", 0.20},
    TiKVMem:       costVer2Factor{"tikv_mem_factor", 0.20},
    TiFlashMem:    costVer2Factor{"tiflash_mem_factor", 0.05},
    TiDBDisk:      costVer2Factor{"tidb_disk_factor", 200.00},
    TiDBRequest:   costVer2Factor{"tidb_request_factor", 6000000.00},
}

当咱们能够评估出物理执行打算的代价时，将会枚举所有能够用上物理执行打算，并在其中抉择代价最小的物理执行打算。个别枚举分为两个流派：

自底向上：代表有 System R。
自顶向下：代表有 Cascade。

自底向上没法解决一个问题。当下层对上层的输入后果程序感兴趣时，那就不能只能从底层的视角来寻找部分最优。

举个例子，多表 Join。一开始两个表 Join 可能 HashJoin 代价很低，然而 Join 下一个表时，用 MergeJoin 从整体来看代价更低。从这个 case 来看，自底向上来做打算取优并不适合。

所以有了 Cascade：

搜寻计划是自顶向下的。这意味着它能够防止部分最优而导致全局不优。从 Operator Tree 自顶向下遍历时，能够做一系列变换：

- Implementation：逻辑算子能够转换成物理算子；例如 Join 转换成 NestLoop 或者 HashJoin 等
- Exploration：逻辑算子能够做等价变换；例如替换 Inner Join 的两个子节点，即可枚举 Join 程序

图片来自于：Cascades Optimizer——https://zhuanlan.zhihu.com/p/73545345

它是基于 Volcano 模型演进而来的。
用面向对象的形式进行建模，编写规定时不须要关怀搜寻过程。相比传统优化器中面向过程去一条条的编写，确实是很大的改良。

大抵的代码调用链为：

-- session/session.go

\-- ExecuteStmt //SQL 执行的入口

|-- executor/compiler.go

\-- Compile // 将 SQL 变成可执行的打算

|--planner/planner/optmize.go

\-- Optimize // 优化的入口

\-- optimize // 这里有两个入口。一种是新的优化器入口，一种是老的优化器入口。依据配置来抉择。最终都会返回最优的物理执行打算。|-- planner/cascades/optmize.go

        \--FindBestPlan 见 5.1
    
        \-- onPhasePreprocessing // 见 5.3

        \-- implGroup

        |--planner/core/optmizer.go // 见 5.4

            \-- DoOptimize

            \-- physicalOptimize

            |--planner/core/find_best_task.go

            \-- findBestTask

            \-- enumeratePhysicalPlans4Task

            \-- compareTaskCost
    
            \-- getTaskPlanCost
    
            |-- planner/core/plan_cost_ver2.go

            \-- getPlanCost

外围入口为：


// FindBestPlan is the optimization entrance of the cascades planner. The
// optimization is composed of 3 phases: preprocessing, exploration and implementation.
//
// ------------------------------------------------------------------------------
// Phase 1: Preprocessing
// ------------------------------------------------------------------------------
//
// The target of this phase is to preprocess the plan tree by some heuristic
// rules which should always be beneficial, for example Column Pruning.
//
// ------------------------------------------------------------------------------
// Phase 2: Exploration
// ------------------------------------------------------------------------------
//
// The target of this phase is to explore all the logically equivalent
// expressions by exploring all the equivalent group expressions of each group.
//
// At the very beginning, there is only one group expression in a Group. After
// applying some transformation rules on certain expressions of the Group, all
// the equivalent expressions are found and stored in the Group. This procedure
// can be regarded as searching for a weak connected component in a directed
// graph, where nodes are expressions and directed edges are the transformation
// rules.
//
// ------------------------------------------------------------------------------
// Phase 3: Implementation
// ------------------------------------------------------------------------------
//
// The target of this phase is to search the best physical plan for a Group
// which satisfies a certain required physical property.
//
// In this phase, we need to enumerate all the applicable implementation rules
// for each expression in each group under the required physical property. A
// memo structure is used for a group to reduce the repeated search on the same
// required physical property.
func (opt *Optimizer) FindBestPlan(sctx sessionctx.Context, logical plannercore.LogicalPlan) (p plannercore.PhysicalPlan, cost float64, err error) {logical, err = opt.onPhasePreprocessing(sctx, logical)
    if err != nil {return nil, 0, err}
    rootGroup := memo.Convert2Group(logical)
    err = opt.onPhaseExploration(sctx, rootGroup)
    if err != nil {return nil, 0, err}
    p, cost, err = opt.onPhaseImplementation(sctx, rootGroup)
    if err != nil {return nil, 0, err}
    err = p.ResolveIndices()
    return p, cost, err
}

正文 + 代码很洁净，这里一共做三件事

onPhasePreprocessing：正文很切实，说for example Column Pruning，后果外面真的只做了列裁剪。
onPhaseExploration：摸索所有逻辑等价存在的可能
onPhaseImplementation：依据代价寻找最优的物理执行打算

这块官网的博客写的十分好，我就不反复了：https://cn.pingcap.com/blog/tidb-cascades-planner/

这块代码次要集中在 stats.go 里，外面的外围数据结构是 stats_info.go 里的 StatsInfo。调用链大抵为：

|-- planner/cascades/optimizer.go

\--fillGroupStats

|-- planner/core/stats.go

\--DeriveStats

type LogicalPlan interface {
    Plan
    //...... 疏忽一些代码
    // DeriveStats derives statistic info for current plan node given child stats.
    // We need selfSchema, childSchema here because it makes this method can be used in
    // cascades planner, where LogicalPlan might not record its children or schema.
    DeriveStats(childStats []*property.StatsInfo, selfSchema *expression.Schema, childSchema []*expression.Schema, colGroups [][]*expression.Column) (*property.StatsInfo, error)
    //...... 疏忽一些代码
}

有很多构造体实现了这个办法：

收集统计信息次要是 analyze.go#Next 办法中调用的 #analyzeWorker。

代码入口为：

// implGroup finds the best Implementation which satisfies the required
// physical property for a Group. The best Implementation should have the
// lowest cost among all the applicable Implementations.
//
// g:            the Group to be implemented.
// reqPhysProp: the required physical property.
// costLimit:   the maximum cost of all the Implementations.
func (opt *Optimizer) implGroup(g *memo.Group, reqPhysProp *property.PhysicalProperty, costLimit float64) (memo.Implementation, error) {groupImpl := g.GetImpl(reqPhysProp)
    if groupImpl != nil {if groupImpl.GetCost() <= costLimit {return groupImpl, nil}
        return nil, nil
    }
    // Handle implementation rules for each equivalent GroupExpr.
    var childImpls []memo.Implementation
    err := opt.fillGroupStats(g)
    if err != nil {return nil, err}
    outCount := math.Min(g.Prop.Stats.RowCount, reqPhysProp.ExpectedCnt)
    for elem := g.Equivalents.Front(); elem != nil; elem = elem.Next() {curExpr := elem.Value.(*memo.GroupExpr)
        impls, err := opt.implGroupExpr(curExpr, reqPhysProp)
        if err != nil {return nil, err}
        for _, impl := range impls {childImpls = childImpls[:0]
            for i, childGroup := range curExpr.Children {childImpl, err := opt.implGroup(childGroup, impl.GetPlan().GetChildReqProps(i), impl.GetCostLimit(costLimit, childImpls...))
                if err != nil {return nil, err}
                if childImpl == nil {impl.SetCost(math.MaxFloat64)
                    break
                }
                childImpls = append(childImpls, childImpl)
            }
            if impl.GetCost() == math.MaxFloat64 {continue}
            implCost := impl.CalcCost(outCount, childImpls...)
            if implCost > costLimit {continue}
            if groupImpl == nil || groupImpl.GetCost() > implCost {groupImpl = impl.AttachChildren(childImpls...)
                costLimit = implCost
            }
        }
    }
    // Handle enforcer rules for required physical property.
    for _, rule := range GetEnforcerRules(g, reqPhysProp) {newReqPhysProp := rule.NewProperty(reqPhysProp)
        enforceCost := rule.GetEnforceCost(g)
        childImpl, err := opt.implGroup(g, newReqPhysProp, costLimit-enforceCost)
        if err != nil {return nil, err}
        if childImpl == nil {continue}
        impl := rule.OnEnforce(reqPhysProp, childImpl)
        implCost := enforceCost + childImpl.GetCost()
        impl.SetCost(implCost)
        if groupImpl == nil || groupImpl.GetCost() > implCost {
            groupImpl = impl
            costLimit = implCost
        }
    }
    if groupImpl == nil || groupImpl.GetCost() == math.MaxFloat64 {return nil, nil}
    g.InsertImpl(reqPhysProp, groupImpl)
    return groupImpl, nil
}

这里个函数会找出潜在符合条件的物理执行打算，并一直的搜寻。但它有一个剪枝逻辑——会记录以后最小的 cost，如果一个执行打算自上向下搜寻时，如果超过了这个 cost，则间接返回。这就是在第 3 节提到的自顶向下的优化。

接下来咱们看一下 memo.Implementation 的定义：


package memo

import (plannercore "github.com/pingcap/tidb/planner/core")

// Implementation defines the interface for cost of physical plan.
type Implementation interface {CalcCost(outCount float64, children ...Implementation) float64
    SetCost(cost float64)
    GetCost() float64
    GetPlan() plannercore.PhysicalPlan

    // AttachChildren is used to attach children implementations and returns it self.
    AttachChildren(children ...Implementation) Implementation

    // GetCostLimit gets the costLimit for implementing the next childGroup.
    GetCostLimit(costLimit float64, children ...Implementation) float64
}

其中 CalcCost 办法就是用来计算物理执行打算的代价。一共有这么多构造体实现了它：

咱们以结尾的例子，解说代价的评估。

select 代价计算形式

// CalcCost calculates the cost of the table scan Implementation.
func (impl *TableScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {ts := impl.plan.(*plannercore.PhysicalTableScan)
    width := impl.tblColHists.GetTableAvgRowSize(impl.plan.SCtx(), impl.tblCols, kv.TiKV, true)
    sessVars := ts.SCtx().GetSessionVars()
    impl.cost = outCount * sessVars.GetScanFactor(ts.Table) * width
    if ts.Desc {impl.cost = outCount * sessVars.GetDescScanFactor(ts.Table) * width
    }
    return impl.cost
}

// GetScanFactor returns the session variable scanFactor
// returns 0 when tbl is a temporary table.
func (s *SessionVars) GetScanFactor(tbl *model.TableInfo) float64 {
    if tbl != nil {
        if tbl.TempTableType != model.TempTableNone {return 0}
    }
    return s.scanFactor 
}


// CalcCost implements Implementation interface.
func (impl *IndexScanImpl) CalcCost(outCount float64, _ ...memo.Implementation) float64 {is := impl.plan.(*plannercore.PhysicalIndexScan)
    sessVars := is.SCtx().GetSessionVars()
    rowSize := impl.tblColHists.GetIndexAvgRowSize(is.SCtx(), is.Schema().Columns, is.Index.Unique)
    cost := outCount * rowSize * sessVars.GetScanFactor(is.Table)
    if is.Desc {cost = outCount * rowSize * sessVars.GetDescScanFactor(is.Table)
    }
    cost += float64(len(is.Ranges)) * sessVars.GetSeekFactor(is.Table)
    impl.cost = cost
    return impl.cost
}

这里咱们以全表扫描表和命中索引的代码为例子。其中默认的 scanFactor 是 1.5。这里能够看到 indexScan 和 tableScan 少了一个因数：width。这个变量代表了所需列的均匀大小。这么看基本上是 indexScan 最优了。

这里的代码笔者认为有点不优雅——当 Desc 时，其实之前的 Cost 是没必要算一遍的，节约 CPU 资源。

join 代价计算形式

// CalcCost implements Implementation CalcCost interface.
func (impl *HashJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {hashJoin := impl.plan.(*plannercore.PhysicalHashJoin)
    // The children here are only used to calculate the cost.
    hashJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())
    selfCost := hashJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), false, 0, nil)
    impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()
    return impl.cost
}

// CalcCost implements Implementation CalcCost interface.
func (impl *MergeJoinImpl) CalcCost(_ float64, children ...memo.Implementation) float64 {mergeJoin := impl.plan.(*plannercore.PhysicalMergeJoin)
    // The children here are only used to calculate the cost.
    mergeJoin.SetChildren(children[0].GetPlan(), children[1].GetPlan())
    selfCost := mergeJoin.GetCost(children[0].GetPlan().StatsCount(), children[1].GetPlan().StatsCount(), 0)
    impl.cost = selfCost + children[0].GetCost() + children[1].GetCost()
    return impl.cost
}

具体的计算都在 plan_cost_v1.go 里：


// GetCost computes cost of hash join operator itself.
func (p *PhysicalHashJoin) GetCost(lCnt, rCnt float64, isMPP bool, costFlag uint64, op *physicalOptimizeOp) float64 {
    buildCnt, probeCnt := lCnt, rCnt
    build := p.children[0]
    // Taking the right as the inner for right join or using the outer to build a hash table.
    if (p.InnerChildIdx == 1 && !p.UseOuterToBuild) || (p.InnerChildIdx == 0 && p.UseOuterToBuild) {
        buildCnt, probeCnt = rCnt, lCnt
        build = p.children[1]
    }
    sessVars := p.ctx.GetSessionVars()
    oomUseTmpStorage := variable.EnableTmpStorageOnOOM.Load()
    memQuota := sessVars.MemTracker.GetBytesLimit() // sessVars.MemQuotaQuery && hint
    rowSize := getAvgRowSize(build.statsInfo(), build.Schema().Columns)
    spill := oomUseTmpStorage && memQuota > 0 && rowSize*buildCnt > float64(memQuota) && p.storeTp != kv.TiFlash
    // Cost of building hash table.
    cpuFactor := sessVars.GetCPUFactor()
    diskFactor := sessVars.GetDiskFactor()
    memoryFactor := sessVars.GetMemoryFactor()
    concurrencyFactor := sessVars.GetConcurrencyFactor()

    cpuCost := buildCnt * cpuFactor
    memoryCost := buildCnt * memoryFactor
    diskCost := buildCnt * diskFactor * rowSize
    // Number of matched row pairs regarding the equal join conditions.
    helper := &fullJoinRowCountHelper{sctx:            p.SCtx(),
        cartesian:       false,
        leftProfile:     p.children[0].statsInfo(),
        rightProfile:    p.children[1].statsInfo(),
        leftJoinKeys:    p.LeftJoinKeys,
        rightJoinKeys:   p.RightJoinKeys,
        leftSchema:      p.children[0].Schema(),
        rightSchema:     p.children[1].Schema(),
        leftNAJoinKeys:  p.LeftNAJoinKeys,
        rightNAJoinKeys: p.RightNAJoinKeys,
    }
    numPairs := helper.estimate()
    // For semi-join class, if `OtherConditions` is empty, we already know
    // the join results after querying hash table, otherwise, we have to
    // evaluate those resulted row pairs after querying hash table; if we
    // find one pair satisfying the `OtherConditions`, we then know the
    // join result for this given outer row, otherwise we have to iterate
    // to the end of those pairs; since we have no idea about when we can
    // terminate the iteration, we assume that we need to iterate half of
    // those pairs in average.
    if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||
        p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {if len(p.OtherConditions) > 0 {numPairs *= 0.5} else {numPairs = 0}
    }
    if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {numPairs = getOperatorActRows(p)
    }
    // Cost of querying hash table is cheap actually, so we just compute the cost of
    // evaluating `OtherConditions` and joining row pairs.
    probeCost := numPairs * cpuFactor
    probeDiskCost := numPairs * diskFactor * rowSize
    // Cost of evaluating outer filter.
    if len(p.LeftConditions)+len(p.RightConditions) > 0 {
        // Input outer count for the above compution should be adjusted by SelectionFactor.
        probeCost *= SelectionFactor
        probeDiskCost *= SelectionFactor
        probeCost += probeCnt * cpuFactor
    }
    diskCost += probeDiskCost
    probeCost /= float64(p.Concurrency)
    // Cost of additional concurrent goroutines.
    cpuCost += probeCost + float64(p.Concurrency+1)*concurrencyFactor
    // Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table
    if p.UseOuterToBuild {
        if spill {
            // It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk
            cpuCost += buildCnt * cpuFactor
        } else {cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)
        }
        diskCost += buildCnt * diskFactor * rowSize
    }

    if spill {memoryCost *= float64(memQuota) / (rowSize * buildCnt)
    } else {diskCost = 0}
    if op != nil {
        setPhysicalHashJoinCostDetail(p, op, spill, buildCnt, probeCnt, cpuFactor, rowSize, numPairs,
            cpuCost, probeCost, memoryCost, diskCost, probeDiskCost,
            diskFactor, memoryFactor, concurrencyFactor,
            memQuota)
    }
    return cpuCost + memoryCost + diskCost
}


// GetCost computes cost of merge join operator itself.
func (p *PhysicalMergeJoin) GetCost(lCnt, rCnt float64, costFlag uint64) float64 {
    outerCnt := lCnt
    innerCnt := rCnt
    innerKeys := p.RightJoinKeys
    innerSchema := p.children[1].Schema()
    innerStats := p.children[1].statsInfo()
    if p.JoinType == RightOuterJoin {
        outerCnt = rCnt
        innerCnt = lCnt
        innerKeys = p.LeftJoinKeys
        innerSchema = p.children[0].Schema()
        innerStats = p.children[0].statsInfo()}
    helper := &fullJoinRowCountHelper{sctx:          p.SCtx(),
        cartesian:     false,
        leftProfile:   p.children[0].statsInfo(),
        rightProfile:  p.children[1].statsInfo(),
        leftJoinKeys:  p.LeftJoinKeys,
        rightJoinKeys: p.RightJoinKeys,
        leftSchema:    p.children[0].Schema(),
        rightSchema:   p.children[1].Schema(),}
    numPairs := helper.estimate()
    if p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin ||
        p.JoinType == LeftOuterSemiJoin || p.JoinType == AntiLeftOuterSemiJoin {if len(p.OtherConditions) > 0 {numPairs *= 0.5} else {numPairs = 0}
    }
    if hasCostFlag(costFlag, CostFlagUseTrueCardinality) {numPairs = getOperatorActRows(p)
    }
    sessVars := p.ctx.GetSessionVars()
    probeCost := numPairs * sessVars.GetCPUFactor()
    // Cost of evaluating outer filters.
    var cpuCost float64
    if len(p.LeftConditions)+len(p.RightConditions) > 0 {
        probeCost *= SelectionFactor
        cpuCost += outerCnt * sessVars.GetCPUFactor()}
    cpuCost += probeCost
    // For merge join, only one group of rows with same join key(not null) are cached,
    // we compute average memory cost using estimated group size.
    NDV, _ := getColsNDVWithMatchedLen(innerKeys, innerSchema, innerStats)
    memoryCost := (innerCnt / NDV) * sessVars.GetMemoryFactor()
    return cpuCost + memoryCost
}

HashJoin 要思考到内存不够的状况，因而在计算到数据不够时，会将对应的数据压入硬盘。但它对数据的程序并无要求，因而能够并发的去做。见：

    // Cost of traveling the hash table to resolve missing matched cases when building the hash table from the outer table
    if p.UseOuterToBuild {
        if spill {
            // It runs in sequence when build data is on disk. See handleUnmatchedRowsFromHashTableInDisk
            cpuCost += buildCnt * cpuFactor
        } else {cpuCost += buildCnt * cpuFactor / float64(p.Concurrency)
        }
        diskCost += buildCnt * diskFactor * rowSize
    }

而 MergeJoin 的代价显然会更小，但可能选则到这个打算也有较高的要求：当两个关联表要 Join 的字段须要按排好的程序读取时，实用 Merge Join 算法。

这块代码次要是在 plan_cost_ver1.go 和plan_cost_ver2.go。v2 对代价公式进行了更准确的回归校准，调整了局部代价公式，比此前版本的代价公式更加精确。代码上也更为简洁：v2 只暴露出了一个公共办法，外部通过不同的类型做转发。

// GetPlanCost returns the cost of this plan.
func GetPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {return getPlanCost(p, taskType, option)
}

func getPlanCost(p PhysicalPlan, taskType property.TaskType, option *PlanCostOption) (float64, error) {if p.SCtx().GetSessionVars().CostModelVersion == modelVer2 {planCost, err := p.getPlanCostVer2(taskType, option)
        return planCost.cost, err
    }
    return p.getPlanCostVer1(taskType, option)
}

依据不同的 PhysicalPlan 类型，会找到不同绑定办法：

v1 的局部办法展现：

select 代价计算形式

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = child-cost + filter-cost
func (p *PhysicalSelection) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {return p.planCostVer2, nil}

    inputRows := getCardinality(p.children[0], option.CostFlag)
    cpuFactor := getTaskCPUFactorVer2(p, taskType)

    filterCost := filterCostVer2(option, inputRows, p.Conditions, cpuFactor)

    childCost, err := p.children[0].getPlanCostVer2(taskType, option)
    if err != nil {return zeroCostVer2, err}

    p.planCostVer2 = sumCostVer2(filterCost, childCost)
    p.planCostInit = true
    return p.planCostVer2, nil
}

这部分代码简略易读。代价就是子查问的代价 + 筛选的代价。

那么问题来了，中索引的和不中索引的代价应该是不一样的。这里没有体现进去啊。认真看childCost, err := p.children[0].getPlanCostVer2(taskType, option)，这里是会去获取子物理执行打算的代价。

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
func (p *PointGetPlan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {return p.planCostVer2, nil}

    if p.accessCols == nil { // from fast plan code path
        p.planCostVer2 = zeroCostVer2
        p.planCostInit = true
        return zeroCostVer2, nil
    }
    rowSize := getAvgRowSize(p.stats, p.schema.Columns)
    netFactor := getTaskNetFactorVer2(p, taskType)

    p.planCostVer2 = netCostVer2(option, 1, rowSize, netFactor)
    p.planCostInit = true
    return p.planCostVer2, nil
}

func netCostVer2(option *PlanCostOption, rows, rowSize float64, netFactor costVer2Factor) costVer2 {
    return newCostVer2(option, netFactor,
        rows*rowSize*netFactor.Value,
        func() string { return fmt.Sprintf("net(%v*rowsize(%v)*%v)", rows, rowSize, netFactor) })
}



// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = rows * log2(row-size) * scan-factor
// log2(row-size) is from experiments.
func (p *PhysicalTableScan) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {return p.planCostVer2, nil}

    rows := getCardinality(p, option.CostFlag)
    var rowSize float64
    if p.StoreType == kv.TiKV {rowSize = getAvgRowSize(p.stats, p.tblCols) // consider all columns if TiKV
    } else { // TiFlash
        rowSize = getAvgRowSize(p.stats, p.schema.Columns)
    }
    rowSize = math.Max(rowSize, 2.0)
    scanFactor := getTaskScanFactorVer2(p, p.StoreType, taskType)

    p.planCostVer2 = scanCostVer2(option, rows, rowSize, scanFactor)

    // give TiFlash a start-up cost to let the optimizer prefers to use TiKV to process small table scans.
    if p.StoreType == kv.TiFlash {p.planCostVer2 = sumCostVer2(p.planCostVer2, scanCostVer2(option, 10000, rowSize, scanFactor))
    }

    p.planCostInit = true
    return p.planCostVer2, nil
}

func scanCostVer2(option *PlanCostOption, rows, rowSize float64, scanFactor costVer2Factor) costVer2 {
    if rowSize < 1 {rowSize = 1}
    return newCostVer2(option, scanFactor,
        // rows * log(row-size) * scanFactor, log2 from experiments
        rows*math.Log2(rowSize)*scanFactor.Value,
        func() string { return fmt.Sprintf("scan(%v*logrowsize(%v)*%v)", rows, rowSize, scanFactor) })
}

scanFactor 的代价默认是 40.7，netFactor 的代价默认是 3.96。联合代码来看，命中索引的代价更优。

join 代价计算形式

// getPlanCostVer2 returns the plan-cost of this sub-plan, which is:
// plan-cost = build-child-cost + build-filter-cost +
// (probe-cost + probe-filter-cost) / concurrency
// probe-cost = probe-child-cost * build-rows / batchRatio
func (p *PhysicalIndexJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {return p.getIndexJoinCostVer2(taskType, option, 0)
}

func (p *PhysicalIndexHashJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {return p.getIndexJoinCostVer2(taskType, option, 1)
}

func (p *PhysicalIndexMergeJoin) getPlanCostVer2(taskType property.TaskType, option *PlanCostOption) (costVer2, error) {return p.getIndexJoinCostVer2(taskType, option, 2)
}

func (p *PhysicalIndexJoin) getIndexJoinCostVer2(taskType property.TaskType, option *PlanCostOption, indexJoinType int) (costVer2, error) {if p.planCostInit && !hasCostFlag(option.CostFlag, CostFlagRecalculate) {return p.planCostVer2, nil}

    build, probe := p.children[1-p.InnerChildIdx], p.children[p.InnerChildIdx]
    buildRows := getCardinality(build, option.CostFlag)
    buildRowSize := getAvgRowSize(build.Stats(), build.Schema().Columns)
    probeRowsOne := getCardinality(probe, option.CostFlag)
    probeRowsTot := probeRowsOne * buildRows
    probeRowSize := getAvgRowSize(probe.Stats(), probe.Schema().Columns)
    buildFilters, probeFilters := p.LeftConditions, p.RightConditions
    probeConcurrency := float64(p.ctx.GetSessionVars().IndexLookupJoinConcurrency())
    cpuFactor := getTaskCPUFactorVer2(p, taskType)
    memFactor := getTaskMemFactorVer2(p, taskType)
    requestFactor := getTaskRequestFactorVer2(p, taskType)

    buildFilterCost := filterCostVer2(option, buildRows, buildFilters, cpuFactor)
    buildChildCost, err := build.getPlanCostVer2(taskType, option)
    if err != nil {return zeroCostVer2, err}
    buildTaskCost := newCostVer2(option, cpuFactor,
        buildRows*10*cpuFactor.Value,
        func() string { return fmt.Sprintf("cpu(%v*10*%v)", buildRows, cpuFactor) })
    startCost := newCostVer2(option, cpuFactor,
        10*3*cpuFactor.Value,
        func() string { return fmt.Sprintf("cpu(10*3*%v)", cpuFactor) })

    probeFilterCost := filterCostVer2(option, probeRowsTot, probeFilters, cpuFactor)
    probeChildCost, err := probe.getPlanCostVer2(taskType, option)
    if err != nil {return zeroCostVer2, err}

    var hashTableCost costVer2
    switch indexJoinType {
    case 1: // IndexHashJoin
        hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)
    case 2: // IndexMergeJoin
        hashTableCost = newZeroCostVer2(traceCost(option))
    default: // IndexJoin
        hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)
    }

    // IndexJoin executes a batch of rows at a time, so the actual cost of this part should be
    //  `innerCostPerBatch * numberOfBatches` instead of `innerCostPerRow * numberOfOuterRow`.
    // Use an empirical value batchRatio to handle this now.
    // TODO: remove this empirical value.
    batchRatio := 6.0
    probeCost := divCostVer2(mulCostVer2(probeChildCost, buildRows), batchRatio)

    // Double Read Cost
    doubleReadCost := newZeroCostVer2(traceCost(option))
    if p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate > 0 {batchSize := float64(p.ctx.GetSessionVars().IndexJoinBatchSize)
        taskPerBatch := 1024.0 // TODO: remove this magic number
        doubleReadTasks := buildRows / batchSize * taskPerBatch
        doubleReadCost = doubleReadCostVer2(option, doubleReadTasks, requestFactor)
        doubleReadCost = mulCostVer2(doubleReadCost, p.ctx.GetSessionVars().IndexJoinDoubleReadPenaltyCostRate)
    }

    p.planCostVer2 = sumCostVer2(startCost, buildChildCost, buildFilterCost, buildTaskCost, divCostVer2(sumCostVer2(doubleReadCost, probeCost, probeFilterCost, hashTableCost), probeConcurrency))
    p.planCostInit = true
    return p.planCostVer2, nil
}

关键在于：

    switch indexJoinType {
    case 1: // IndexHashJoin
        hashTableCost = hashBuildCostVer2(option, buildRows, buildRowSize, float64(len(p.RightJoinKeys)), cpuFactor, memFactor)
    case 2: // IndexMergeJoin
        hashTableCost = newZeroCostVer2(traceCost(option))
    default: // IndexJoin
        hashTableCost = hashBuildCostVer2(option, probeRowsTot, probeRowSize, float64(len(p.LeftJoinKeys)), cpuFactor, memFactor)
    }

对应办法：

func hashBuildCostVer2(option *PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costVer2Factor) costVer2 {// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
    hashKeyCost := newCostVer2(option, cpuFactor,
        buildRows*nKeys*cpuFactor.Value,
        func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) })
    hashMemCost := newCostVer2(option, memFactor,
        buildRows*buildRowSize*memFactor.Value,
        func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
    hashBuildCost := newCostVer2(option, cpuFactor,
        buildRows*cpuFactor.Value,
        func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) })
    return sumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)
}

func newZeroCostVer2(trace bool) (ret costVer2) {
    if trace {ret.trace = &costTrace{make(map[string]float64), ""}
    }
    return
}

简略的看一下代码，咱们能够发现，从大多数的场景来看，依照代价从小到大来排，这几种 Join 是 MergeJoin<HashJoin<IndexJoin。

总得来说这块代码较为简单，实质就是枚举所有符合条件的物理执行打算，并挑选出代价最小的执行打算，故不再列举代码。有趣味的同学能够依据以下纲要自行翻阅：

|--planner/core/find_best_task.go

\-- findBestTask

\-- enumeratePhysicalPlans4Task

\-- compareTaskCost

\-- getTaskPlanCost

|-- planner/core/plan_cost_ver2.go

\-- getPlanCost

Cascades Optimizer：https://zhuanlan.zhihu.com/p/73545345
揭秘 TiDB 新优化器：Cascades Planner 原理解析：https://cn.pingcap.com/blog/tidb-cascades-planner/
TiDB 文档 - 统计信息简介：https://docs.pingcap.com/zh/tidb/v6.5/statistics#%E7%BB%9F%E8…
TiDB 文档 - 谬误索引的解决方案：https://docs.pingcap.com/zh/tidb/v6.5/wrong-index-solution#%E…
The Volcano Optimizer Generator: Extensibility and Efficient Search：https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimiz…
The Cascades Framework for Query Optimization：https://15721.courses.cs.cmu.edu/spring2018/papers/15-optimiz…

数据库引擎执行器中十分闻名的两种优化形式，code generation 和 vectorized execution。

code generation 次要是依据上下文来生成一整段优化过的代码，这与那种嵌套大量 if…else、for 循环、虚办法的代码齐全相同，齐全面向性能思考。

vectorized execution 基于拉模型。相比于一次拉一个 tuple 来说，它的批量拉取缩小了屡次拉取的开销，同时还能够应用到 SIMD。基于这种场景，vectorized execution 的优化更加实用于列式数据库。

关于tidb:读TiDB源码聊设计浅析HTAP的SQL优化器

1. 引子

2. 统计信息收集

3. 代价的评估

4. 执行打算枚举与择优

5.TiDB 的实现

5.1 逻辑优化

5.2 统计信息收集

5.3 新版本物理执行打算的抉择

5.3.1 代价的评估

select 代价计算形式

join 代价计算形式

5.4 老版本物理执行打算的抉择

5.4.1 代价的评估

select 代价计算形式

join 代价计算形式

5.4.2 执行打算枚举与择优

6. 其余

6.1 参考与援用的文章

6.2 常识补充：code generation && vectorized execution

Just My Socks（注册教程内含优惠码）

关于tidb:读TiDB源码聊设计浅析HTAP的SQL优化器

1. 引子

2. 统计信息收集

3. 代价的评估

4. 执行打算枚举与择优

5.TiDB 的实现

5.1 逻辑优化

5.2 统计信息收集

5.3 新版本 物理执行打算的抉择

5.3.1 代价的评估

select 代价计算形式

join 代价计算形式

5.4 老版本 物理执行打算的抉择

5.4.1 代价的评估

select 代价计算形式

join 代价计算形式

5.4.2 执行打算枚举与择优

6. 其余

6.1 参考与援用的文章

6.2 常识补充：code generation && vectorized execution

Just My Socks（注册教程 内含优惠码）

5.3 新版本物理执行打算的抉择

5.4 老版本物理执行打算的抉择

Just My Socks（注册教程内含优惠码）