关于druid:leveldb-sstable-min-max区间搜索源码分析1

作者：王东阳

leveldb 中 min_max 搜寻剖析

前言

leveldb 是一个写性能非常优良的存储引擎，是典型的 LSM 树 (Log Structured-Merge Tree) 实现。LSM 树的核心思想是将随机写转化为间断写，从而晋升写操作的吞吐能力，整体架构如下：

尽管在内存中，所有的数据都是按序排列的，然而当多个 memetable 数据长久化到磁盘后，对应的不同的 sstable 之间是存在交加的，在读操作时，须要对所有的 sstable 文件进行遍历，重大影响了读取效率。因而 leveldb 后盾会“定期“整合这些 sstable 文件，该过程也称为 compaction。随着 compaction 的进行，sstable 文件在逻辑上被分成若干层，由内存数据间接 dump 进去的文件称为 level0 层文件，前期整合而成的文件为 level i 层文件，这也是 leveldb 这个名字的由来。https://www.bookstack.cn/read…

在 leveldb 的压缩过程中, 须要向上层搜寻和相互重叠的 sstable 进行合并，如下图所示：

下层数据如何疾速的查问下一层中和以后 sstable 文件重叠的文件列表呢？由上图能够看出每个 sstable 中都有一个最小值 min 和最大值 max，示意以后文件所蕴含 key 的最大值和最小值，能够当成一个数据区间，Level0 层的 key 区间相互有重叠，残余其它层中，每一层中的 sstable 文件，所蕴含的 key 区间，相互不重叠，所以对它们排序后，无论是最大值还是最小值都是严格递增的。

除了文件压缩，leveldb 在执行查问的过程中，也会利用 sstable 的 min,max 的属性信息进行疾速查找申请 key 所在的文件。

本文次要基于 leveldb:table.go 和 leveldb_rs:table.rs 介绍 sstable 区间搜寻相干接口代码的实现算法。

tFile

文件的构造如下，次要蕴含文件的形容信息 fd，文件中数据的大小 size，以及文件中蕴含 key 的最小，最小值 (imin,imax). 在本文中，咱们只须要关注 imin,imax 就能够了。

// tFile holds basic information about a table.
type tFile struct {
    fd         storage.FileDesc
    seekLeft   int32
    size       int64
    imin, imax internalKey
}

接下来是 tFile 中的办法，须要留神的是上面的办法中都有一个参数 icmp *iComparer , 咱们能够了解为一个比拟器，用于判断 key 大小时候应用的。

首先是 tFile 的 after 办法，判断给定的 key 是否在这个文件的前面 , 这个办法咱们能够这么了解，在一个程度数轴上，以后 tFile 的 imin,imax 对应数轴上的一个区间，判断给定的 key 是否在这个区间的前面，也就是判断给定 ukey 是否比以后文件中的最大值 imax 要大。

after 办法示意图：│          │
          │          │         ukey
  ────────▼──────────▼─────────▲───────►
         imin      imax        │
                               │

对应代码如下

// Returns true if given key is after largest key of this table.
func (t *tFile) after(icmp *iComparer, ukey []byte) bool {return ukey != nil && icmp.uCompare(ukey, t.imax.ukey()) > 0
}

同理，tFile 的 before 办法用来判断给定的 ukey 是否在在以后文件的后面，也就是判断 ukey 是否比以后文件中的最小值 imin 要小。

// Returns true if given key is before smallest key of this table.
func (t *tFile) before(icmp *iComparer, ukey []byte) bool {return ukey != nil && icmp.uCompare(ukey, t.imin.ukey()) < 0
}

overlaps 用来判断指定区间 [umin,umax] 是否和以后文件的区间 [imin,imax] 有重叠。咱们能够反向来思考什么状况下两个区间不重叠。

tFile 区间的最小值 imin 大于另外一个区间的最大值 umax
tFile 区间的最大值 imax 小于另外一个区间的最大值 umin

overlaps 办法示意图

      │       │
      │       │        umin   umax
  ────▼───────▼──────────▲──────▲───────►
     imin    imax        │ tFile│
                         │      │


                      │      │
       umin    umax   │      │
   ──────▲──────▲─────▼──────▼──────►
         │tFile │    imin   imax
         │      │
// Returns true if given key range overlaps with this table key range.
func (t *tFile) overlaps(icmp *iComparer, umin, umax []byte) bool {return !t.after(icmp, umin) && !t.before(icmp, umax)
}

Rust

rust 代码中 tFile 的定义如下，跟 go 中定义十分类似，咱们这里也只关注外面的 imin，imax 就能够。

#[derive(Clone)]
pub struct tFile {
    fd: storage::FileDesc,
    seek_left: Arc<atomic::AtomicI32>,
    size: i64,
    imin: internalKey,
    imax: internalKey,
}

接下来是 tFile 的办法实现，同样所有的办法都有一个参数 icmp: &IComparer<T>，用于比拟 key。首先是 after 办法，判断指定 ukey 是否在以后 tFile 的前面，跟 Go 版本一样，只须要判断 ukey 是否比 imax 大就能够了。

impl tFile {
    // Returns true if given key is after largest key of this table.
    pub fn after<T>(&self, icmp: &IComparer<T>, ukey: &[u8]) -> bool
    where
        T: Comparer,
    {return ukey.len() > 0 && icmp.u_compare(ukey, self.imax.ukey()) == Greater;
    }

before 办法, 判断指定 ukey 是否在以后 tFile 的后面，跟 Go 版本一样，只须要判断 ukey 是否比 imin 小就能够了。

// Returns true if given key is before smallest key of this table.
    pub fn before<T>(&self, icmp: &IComparer<T>, ukey: &[u8]) -> bool
    where
        T: Comparer,
    {return ukey.len() > 0 && icmp.u_compare(ukey, self.imin.ukey()) > Greater;
    }

overlaps 用来判断指定区间 [umin,umax] 是否和以后文件的区间 [imin,imax] 有重叠，跟 GO 本版一样，不多说。

// Returns true if given key range overlaps with this table key 
    pub fn overlaps<T>(&self, icmp: &IComparer<T>, umin: &[u8], umax: &[u8]) -> bool
    where
        T: Comparer,
    {return !self.after(icmp, umin) && !self.before(icmp, umax);
    }

tFiles

tFiles 用来示意 SST 中每一层的文件的汇合。

在 goleveldb 中，tFiles 的定义如下

// tFiles hold multiple tFile.
type tFiles []*tFile

为了反对排序，tFiles 须要实现上面的办法

func (tf tFiles) Len() int      { return len(tf) }
func (tf tFiles) Swap(i, j int) {tf[i], tf[j] = tf[j], tf[i] }

func (tf tFiles) nums() string {
    x := "["
    for i, f := range tf {
        if i != 0 {x += ","}
        x += fmt.Sprint(f.fd.Num)
    }
    x += "]"
    return x
}

接下来会讲 tFiles 外面比拟外围的一些办法。searchMin，searchMax，searchMinUkey，searchMaxUkey 这四个办法调用的前提是 tFiles 中的所有文件都曾经依照升序排好序，且 tFiles 中的所有 tFile 的 key 区间相互不重叠，所以此时 tFiles 中的所有文件无论是依照 imin, 还是依照 imax 都是枯燥递增的. 举例来说，如果三个相互不重叠的区间 [5,7], [1,4]，[9,11], 无论基于 imin 还是 imax，排序的后果都是

[1,4] [5,7] [9,11]

searchMin

searchMin 用于在 tFiles 中查问 imin 大于等于 ikey 的，且 imin 最小的 tFile 的索引，也就是在排序后的 tFiles 中搜寻满足条件的最右边的的 tFile。

// Searches smallest index of tables whose its smallest
// key is after or equal with given key.
func (tf tFiles) searchMin(icmp *iComparer, ikey internalKey) int {return sort.Search(len(tf), func(i int) bool {return icmp.Compare(tf[i].imin, ikey) >= 0
    })
}

举例来说，对于三个区间

[1,4] [5,7] [9,11]

如果 ikey 等于 3，那么搜寻 imin 大于等于 ikey 的最右边的区间就是 [5,7].

searchMax

searchMax 用于在 tFiles 中查问 imax 大于等于 ikey 的，且 imax 最小的 tFile 的索引，也就是在排序后的 tFiles 中搜寻满足条件的最右边的的 tFile。

// Searches smallest index of tables whose its largest
// key is after or equal with given key.
func (tf tFiles) searchMax(icmp *iComparer, ikey internalKey) int {return sort.Search(len(tf), func(i int) bool {return icmp.Compare(tf[i].imax, ikey) >= 0
    })
}

举例来说，对于三个区间

[1,4] [5,7] [9,11]

如果 ikey 等于 3，那么搜寻 imin 大于等于 ikey 的最右边的区间就是 [5,7].

searchMinUkey

searchMinUkey 跟 searchMin 相似，用于在 tFiles 中查问 imin.ukey() 大于等于 umin 的，且 imin 最小的 tFile 的索引，也就是在排序后的 tFiles 中搜寻满足条件的最右边的的 tFile。

// Searches smallest index of tables whose its smallest
// key is after the given key.
func (tf tFiles) searchMinUkey(icmp *iComparer, umin []byte) int {return sort.Search(len(tf), func(i int) bool {return icmp.ucmp.Compare(tf[i].imin.ukey(), umin) > 0
    })
}

searchMaxUkey

searchMaxUkey 跟 searchMax 相似，用于在 tFiles 中查问 imax.ukey() 大于等于 umax 的，且 imax 最小的 tFile 的索引，也就是在排序后的 tFiles 中搜寻满足条件的最右边的的 tFile。

// Searches smallest index of tables whose its largest
// key is after the given key.
func (tf tFiles) searchMaxUkey(icmp *iComparer, umax []byte) int {return sort.Search(len(tf), func(i int) bool {return icmp.ucmp.Compare(tf[i].imax.ukey(), umax) > 0
    })
}

overlaps

overlaps 判断区间 [umin,umax] 是否和 tFiles 中某个文件的区间有重叠

// Returns true if given key range overlaps with one or more
// tables key range. If unsorted is true then binary search will not be used.
func (tf tFiles) overlaps(icmp *iComparer, umin, umax []byte, unsorted bool) bool {
    if unsorted {
        // Check against all files.
        for _, t := range tf {if t.overlaps(icmp, umin, umax) {return true}
        }
        return false
    }

    i := 0
    if len(umin) > 0 {
        // Find the earliest possible internal key for min.
        i = tf.searchMax(icmp, makeInternalKey(nil, umin, keyMaxSeq, keyTypeSeek))
    }
    if i >= len(tf) {
        // Beginning of range is after all files, so no overlap.
        return false
    }
    return !tf[i].before(icmp, umax)
}

这个办法中的第三个参数 unsorted 示意以后 tFiles 是否是排序的，为什么要进行辨别呢？因为在 SST 中，Level 0 层的不同文件之间 key 是有重叠的，无奈进行排序。对于这种没有排序的 tFiles，就只能应用遍历的办法一个一个的 tFile 进行比拟，工夫复杂度是 O(n)。对于曾经排序的 tFiles，咱们能够应用二分法进行搜寻，

首先在 tFiles 中搜寻 imax 大于等于 umin 的最右边的区间。
如果搜寻不到，就示意 [umin,umax] 大于所有的 tFile，必定不重叠
如果搜寻到了，还须要再判断搜寻到的 tFile.imin.ukey()是否小于等于 umax
联合下面步骤 1 和 3，如果都满足，也就是 tFile.imax.ukey() >= umin 且 tFile.imin.ukey() <= umax，满足两个区间重叠的充沛必要条件

getOverlaps

getOverlaps 用于在 tFiles 中求和指定区间 [umin,umax] 有重叠的区间列表，在本文中咱们次要关注如何在排序的 tFiles 疾速求取的算法

if !overlapped {
        var begin, end int
        // Determine the begin index of the overlapped file
        if umin != nil {index := tf.searchMinUkey(icmp, umin)
            if index == 0 {begin = 0} else if bytes.Compare(tf[index-1].imax.ukey(), umin) >= 0 {
                // The min ukey overlaps with the index-1 file, expand it.
                begin = index - 1
            } else {begin = index}
        }
        // Determine the end index of the overlapped file
        if umax != nil {index := tf.searchMaxUkey(icmp, umax)
            if index == len(tf) {end = len(tf)
            } else if bytes.Compare(tf[index].imin.ukey(), umax) <= 0 {
                // The max ukey overlaps with the index file, expand it.
                end = index + 1
            } else {end = index}
        } else {end = len(tf)
        }
        // Ensure the overlapped file indexes are valid.
        if begin >= end {return nil}
        dst = make([]*tFile, end-begin)
        copy(dst, tf[begin:end])
        return dst
    }

这里算法也特地奇妙，通过两次二分法疾速确定与指定区间 [umin,umax] 重叠的最右边区间以及与指定区间 [umin,umax] 重叠的最左边区间的下一个区间。

1，调用 tf.searchMinUkey(icmp, umin) 在 tFiles 中搜寻 imin.ukey()大于等于 umin 的最右边的 tFile，

searchMinUkey 办法示意图：┌─────────┐    ┌───────────┐    ┌───────────┐
          │         │    │           │    │           │
          │ tFile1  │    │  tFile2   │    │   tFile3  │
          │         │    │           │    │           │
  ────────┴────▲────┴──▲─┴───────────┴────┴───────────┴────────────────────────►
               │       │
               │       │
             umin    umax

如图，tf.searchMinUkey(icmp, umin) 搜寻失去 tFile2，然而 tFile1 也有可能和 [umin,umax] 重叠，所以须要额定判断一下

} else if bytes.Compare(tf[index].imin.ukey(), umax) <= 0 {
                // The max ukey overlaps with the index file, expand it.
                end = index + 1
            } else {

2，调用 tf.searchMaxUkey(icmp, umax) 在 tFiles 中搜寻 imax.ukey()大于等于 umax 的最右边的 tFile。这里也有两种状况须要思考，一种是搜寻到的 tFile3 不和 umax 重叠，也就是 tFile3 齐全在 [umin,umax] 的左边

searchMaxUkey 状况一示意图：┌────────┐    ┌──────────┐   ┌──────────┐
   │        │    │          │   │          │
   │ tFil1  │    │    tFile2│   │  tFile3  │
   └────────┴────┴──────▲───┴─▲─┴──────────┴────►
                        │     │
                        │     │
                       umin   umax

第二种就是搜寻到的 fFile3 刚好和 [umin,umax] 重叠

searchMaxUkey 状况二示意图：┌────────┐    ┌────────────┐ ┌──────────┐
   │        │    │            │ │          │
   │ tFil1  │    │    tFile2  │ │  tFile3  │
   └────────┴────┴─────────▲──┴─┴─────▲────┴─────────►
                           │          │
                           │          │
                          umin       umax

这种状况也要独自解决下

} else if bytes.Compare(tf[index].imin.ukey(), umax) <= 0 {
                // The max ukey overlaps with the index file, expand it.
                end = index + 1

3，最终基于 1，2 步骤的两个索引 begin 和 end，取两头的局部 [begin,end) 就是要求的后果。这个算法另外一个奇妙的中央就在于[begin,end) 是一个左闭右开区间，所以如果 begin==end 的话，就能够示意找不到重叠的区间了。

总结

leveldb 中通过在 sstable 中记录 min、max 的信息，能够在合并 sstable 时防止读取整个 sstable 来判断是否产生重叠。通过 min_max 索引，算法能够更高效地实现二分查找。这种 min_max 索引形式在数据库我的项目中被广泛应用，晋升查问效率。

同时，本文介绍了 min_max 索引在 go 和 rust 两种语言中的不同实现，来帮忙读者从代码层面理解 min_max 索引的不同实现计划。

关于druid:leveldb-sstable-min-max区间搜索源码分析1

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）