参考:
- https://github.com/facebook/r...
- https://github.com/facebook/r...
- https://blog.csdn.net/Z_Stand...
- https://www.jianshu.com/p/f57...
- https://book.tidb.io/session3...
- https://www.iggiewang.cn/2021...
Introduction
RocksDB的Iterator在通过高度封装后,能够像C++ stl库为每一个容器结构的迭代器的iterator一样被应用,它能够定位到某个key,并能够从这个key开始进行scan,它也能够被用来进行反向scan。
如果在创立迭代器时,ReadOptions.snapshot被给定,迭代器将返回该snapshot的数据,如果传入的为nullptr,隐性视图将被创立,隐性视图不能转为显性。
用户对Iterator的应用
- NewIterator 创立一个迭代器,须要传入ReadOptions
- Seek 查找一个key
- SeekToFirst 迭代器挪动到db的第一个key地位,个别用于程序遍历整个db的所有key
- SeekToLast 迭代器挪动到db的最初一个key地位, 个别用于反向遍历整个db的所有key
- SeekForPrev挪动到以后key的上一个地位,个别用于遍历(limit, start]之间的key
- Next 迭代器挪动到下一个key
- Prev迭代器挪动到上一个key
Iterator::status()能够返回以后迭代器的error。error蕴含IO error,checksum不匹配,外部谬误等...
如果没有谬误,状态为Status::OK()。如果Iterator::Valid()是true,status必定为OK。
而如果Iterator::Valid()为false,可能是因为此时迭代器走到了数据的开端,这种状况status是OK;也有可能是呈现了谬误,这种状况status不是OK。
5.13.x或更早的版本中,这两个API会有不同,须要再查阅文档。
各Iterator
DBIter
Implementation: db/db_iter.cc
Interface: Iterator
DBIter是对InternalIterator的封装,它的工作次要是将Internal key解析成user key,并展现进去
Example
底层的InternalIterator中
InternalKey(user_key="Key1", seqno=10, Type=Put) | Value = "KEY1_VAL2"InternalKey(user_key="Key1", seqno=9, Type=Put) | Value = "KEY1_VAL1"InternalKey(user_key="Key2", seqno=16, Type=Put) | Value = "KEY2_VAL2"InternalKey(user_key="Key2", seqno=15, Type=Delete) | Value = "KEY2_VAL1"InternalKey(user_key="Key3", seqno=7, Type=Delete) | Value = "KEY3_VAL1"InternalKey(user_key="Key4", seqno=5, Type=Put) | Value = "KEY4_VAL1"
DBIter将Internal Key解析为User Key,并裸露接口给用户
Key="Key1" | Value = "KEY1_VAL2"Key="Key2" | Value = "KEY2_VAL2"Key="Key4" | Value = "KEY4_VAL1"
MergingIterator
Implementation: table/merging_iterator.cc
Interface: InternalIterator
MergingIterator由很多Children Iterator组成
autovector<IteratorWrapper, kNumIterReserve> children_;
MergingIterator会将这些Children Iterator放入堆中,并向下层有序地展现。
Example
children iterator为:
= Child Iterator 1 =InternalKey(user_key="Key1", seqno=10, Type=Put) | Value = "KEY1_VAL2"= Child Iterator 2 =InternalKey(user_key="Key1", seqno=9, Type=Put) | Value = "KEY1_VAL1"InternalKey(user_key="Key2", seqno=15, Type=Delete) | Value = "KEY2_VAL1"InternalKey(user_key="Key4", seqno=5, Type=Put) | Value = "KEY4_VAL1"= Child Iterator 3 =InternalKey(user_key="Key2", seqno=16, Type=Put) | Value = "KEY2_VAL2"InternalKey(user_key="Key3", seqno=7, Type=Delete) | Value = "KEY3_VAL1"
MergingIterator将所有children iterator放入堆中,使其有序
InternalKey(user_key="Key1", seqno=10, Type=Put) | Value = "KEY1_VAL2"InternalKey(user_key="Key1", seqno=9, Type=Put) | Value = "KEY1_VAL1"InternalKey(user_key="Key2", seqno=16, Type=Put) | Value = "KEY2_VAL2"InternalKey(user_key="Key2", seqno=15, Type=Delete) | Value = "KEY2_VAL1"InternalKey(user_key="Key3", seqno=7, Type=Delete) | Value = "KEY3_VAL1"InternalKey(user_key="Key4", seqno=5, Type=Put) | Value = "KEY4_VAL1"
MemtableIterator
Implementation: db/memtable.cc
Interface: InternalIterator
用作对Memtable的迭代器,每个memtable实现了他们本人的迭代器,来对外界暴露出有序可迭代的kv对
BlockIter
Implementation: table/block_based/block.h
Interface: InternalIterator
该迭代器用于从SST File中读取block,无论这些block是index block还是data block或是meta block(DataBlockIter、IndexBlockIter、MetaBlockIter继承了BlockIter)。因为SST文件块是有序且不可变的,因而咱们将该块加载到内存中,并为此排序后的数据创立一个BlockIter。
TwoLevelIterator
Implementation: table/two_level_iterator.cc
Interface: InternalIterator
一个TwoLevelIterator由两个迭代器组成
- first_level_iter_
- second_level_iter_
first_level_iter_ 用来找到应用哪个second_level_iter_ , 而后second_level_iter_ 指向理论须要读的数据
Example
RocksDB应用TwoLevelIterator来读取SST files, first_level_iter_ 是对于Index block的BlockIter,second_level_iter_ 是对于Data block的BlockIter.
[Data block, offset: 0x0000]KEY1 | VALUE1KEY2 | VALUE2KEY3 | VALUE3[Data Block, offset: 0x0100]KEY4 | VALUE4KEY7 | VALUE7[Data Block, offset: 0x0250]KEY8 | VALUE8KEY9 | VALUE9[Data Block, offset: 0x0350]KEY11 | VALUE11KEY15 | VALUE15[Index Block, offset: 0x0500]KEY3 | 0x0000KEY7 | 0x0100KEY9 | 0x0250KEY15 | 0x0500
first_level_iter_ => BlockIter over Index block
second_level_iter_ => BlockIter over Data block,将被first_level_iter_决定指向哪个
当应用TwoLevelIterator来seek KEY8时,首先应用first_level_iter_ seek到对应的data block(0x0250),而后应用second_level_iter_来seek具体的key。
读取Data Block时,若 Block 在 Block Cache 当中,间接返回对象地址,否则,产生磁盘IO,读取 SST 的 Block,结构 Block 对象并缓存其地址在 Block Cache 中。