关于操作系统:FreeBSD-ext2-文件系统基本数据结构分析

寒假生存开始

考试终于完结了,大家缓和的情绪也开始缓缓放松下来。因为疫情连续不断,NanamiNanase 的旅行打算被迫终止。想到之前还没有探讨完的问题,Nanami 转头问 Nanase,”上次探讨的问题要不要持续?”。”反正当初也出不去,在寝室待着也是无聊,我感觉能够,哈哈。” Nanase 笑着答复到。“吼,那我当初给 Douyiya 发消息,去图书馆见”。
此时,Douyiya 与寝室四个好基友曾经整整齐齐坐在电脑前,跃跃欲试,“明天难得五黑,必须怒爬一波天梯!”。忽然微信来了音讯,Douyiya 抬头看了一眼,立马就开始打包好了电脑,而后微微来了一句:“兄弟们,人生大事,我就先走了。” 话音未落,就着急忙慌地冲出了寝室,留下一头雾水的室友们在风中凌乱。

Douyiya 的分享

“好,那明天给你们分享一下十分经典的 ext2 文件系统。它的实现比较简单,比拟适宜咱们这种小白入门。” Nanami 看了一下屏幕上稀稀拉拉的代码,就小声问了一句:“这个代码量感觉还是有点大的,咱们从哪里开始呢?” Douyiya 察觉到了 Nanami 语气中带有的那一丝丝不自信,就微笑着说:“先给你们分享一些根本的数据结构吧,对整个文件系统有个大略的意识,后续再看代码实现就能更好的了解了。” NanamiNanase 相视一笑,点了拍板。

Douyiya:记得上次 Nanami 说过,咱们能够将文件的属性封装到一个数据结构中对立治理,在 ext2 文件系统中就有这么一个数据结构来专门做这件事,struct inode

当前代码都对立摘抄自 freebsd-12.0 版本

#define    EXT2_NDADDR    12        /* Direct addresses in inode. */
#define    EXT2_NIADDR    3        /* Indirect addresses in inode. */

/*
 * The inode is used to describe each active (or recently active) file in the
 * EXT2FS filesystem. It is composed of two types of information. The first
 * part is the information that is needed only while the file is active (such
 * as the identity of the file and linkage to speed its lookup). The second
 * part is the permanent meta-data associated with the file which is read in
 * from the permanent dinode from long term storage when the file becomes
 * active, and is put back when the file is no longer being used.
 */
struct inode {
    // 虚构文件系统(Virtual filesystem, VFS) 层级数据结构,临时疏忽
    struct    vnode  *i_vnode;/* Vnode associated with this inode. */
    struct    ext2mount *i_ump;
    uint32_t i_flag;    /* flags, see below */
    ino_t      i_number;    /* The identity of the inode. */

    struct    m_ext2fs *i_e2fs;    /* EXT2FS */
    u_quad_t i_modrev;    /* Revision level for NFS lease. */
    /*
     * Side effects; used during directory lookup.
     * 上面四个成员在查找目录下子文件时会用到,后续再具体阐明
     */
    int32_t     i_count;    /* Size of free slot in directory. */
    doff_t     i_endoff;    /* End of useful stuff in directory. */
    doff_t     i_diroff;    /* Offset in dir, where we found last entry. */
    doff_t     i_offset;    /* Offset of free space in directory. */

    uint32_t i_block_group;    // 块组号
    uint32_t i_next_alloc_block;
    uint32_t i_next_alloc_goal;

    /* Fields from struct dinode in UFS. */
    uint16_t    i_mode;        /* IFMT, permissions; see below. */
    int32_t        i_nlink;    /* File link count. */
    uint32_t    i_uid;        /* File owner. */
    uint32_t    i_gid;        /* File group. */
    uint64_t    i_size;        /* File byte count. */
    uint64_t    i_blocks;    /* Blocks actually held. */
    int32_t        i_atime;    /* Last access time. */
    int32_t        i_mtime;    /* Last modified time. */
    int32_t        i_ctime;    /* Last inode change time. */
    int32_t        i_birthtime;    /* Inode creation time. */
    int32_t        i_mtimensec;    /* Last modified time. */
    int32_t        i_atimensec;    /* Last access time. */
    int32_t        i_ctimensec;    /* Last inode change time. */
    int32_t        i_birthnsec;    /* Inode creation time. */
    uint32_t    i_gen;        /* Generation number. */
    uint64_t    i_facl;        /* EA block number. 文件扩大属性所在的磁盘块 */
    uint32_t    i_flags;    /* Status flags (chflags). */
    union {
        struct {
            uint32_t i_db[EXT2_NDADDR]; /* Direct disk blocks. */
            uint32_t i_ib[EXT2_NIADDR]; /* Indirect disk blocks. */
        };
        uint32_t i_data[EXT2_NDADDR + EXT2_NIADDR];
    };

    struct ext4_extent_cache i_ext_cache; /* cache for ext4 extent */
};

Nanami:啊,,i_sizei_block 示意的都是文件大小吧,这么设计是不是反复了呀?

Douyiya:哈哈,这个倒不是。还记不记得上次跟你们说过的,文件在磁盘是是以 数据块 的模式存在的,那就肯定会存在最初一个数据块没有被齐全装满的状况。那剩下的空间又不能给其余的文件应用,所以开发者就安顿了两个成员来形容这种情景。i_size 就示意 以字节为单位的文件理论大小,而 i_blocks 则示意这个文件 理论占用的磁盘块数,那么

i_block * disk_block_size >= i_size

Nanase:那 i_number 是做什么用的呢?

Douyiyai_number 能够了解为文件编号,文件系统会为每个文件都调配一个惟一编号,这样咱们就能够通过它去定位文件

Nanase:如此说来,每个文件都会有惟一一个 inode 与之对应喽

Douyiya:哈哈,正解。每个文件的属性都是不一样的,如果共用一个 inode 岂不是乱套了

Nanami:那 inode 中蕴含的 i_data 数组中寄存的应该就是文件蕴含的磁盘块对应的块号了吧,别的属性如同也用不了这么大的空间

Douyiya:嗯嗯,的确是。它其实是分成了两大部分,一个是 间接索引,一个是 间接索引。间接索引很好了解,就是用于解决那些比拟小的文件,把占用的磁盘块号写入到数组当中,当用户拜访文件的时候按程序读取就好了。当用户须要解决大型文件的时候,那就要用到间接块索引了。
与此同时,Douyiya 就关上了早已筹备好的示意图:

间接索引则是会申请磁盘块来寄存磁盘块号,而不是真正的文件数据,这类磁盘块也被叫做 index block。假如磁盘块大小为 512 bytes,那每个 index block 能够寄存 128 个磁盘块号。所以一级间接索引能够映射另外128个磁盘块。同理,二级索引块中寄存的是一级索引块号,三级索引块中寄存的是二级索引块号,它们可映射的磁盘块数量就呈指数级增长。

Nanami:昂,,原来是这样。那文件大小也是有下限的,如果映射到的数据块都装满了,文件就无奈再写入数据了

Douyiya:嗯,是的,所以 ext4 文件系统就针对这个问题改良了设计

思考题:index blocks 算不算在 inode->i_blocks 当中?

Nanase:上次我记得说整个文件系统的状态也是须要治理的,这个应该也是有对应构造体的吧?
Douyiya: 哈哈,我刚想说这个,是一个叫做 超级块(struct ext2fs) 的数据结构:

/*
 * Super block for an ext2fs file system.
 */
struct ext2fs {
    uint32_t  e2fs_icount;        /* Inode count */
    uint32_t  e2fs_bcount;        /* blocks count */
    uint32_t  e2fs_rbcount;        /* reserved blocks count */
    uint32_t  e2fs_fbcount;        /* free blocks count */
    uint32_t  e2fs_ficount;        /* free inodes count */
    uint32_t  e2fs_first_dblock;    /* first data block */
    uint32_t  e2fs_log_bsize;    /* block size = 1024*(2^e2fs_log_bsize) */
    uint32_t  e2fs_log_fsize;    /* fragment size */
    uint32_t  e2fs_bpg;        /* blocks per group */
    uint32_t  e2fs_fpg;        /* frags per group */
    uint32_t  e2fs_ipg;        /* inodes per group */
    uint32_t  e2fs_mtime;        /* mount time */
    uint32_t  e2fs_wtime;        /* write time */
    uint16_t  e2fs_mnt_count;    /* mount count */
    uint16_t  e2fs_max_mnt_count;    /* max mount count */
    uint16_t  e2fs_magic;        /* magic number */
    uint16_t  e2fs_state;        /* file system state */
    uint16_t  e2fs_beh;        /* behavior on errors */
    uint16_t  e2fs_minrev;        /* minor revision level */
    uint32_t  e2fs_lastfsck;    /* time of last fsck */
    uint32_t  e2fs_fsckintv;    /* max time between fscks */
    uint32_t  e2fs_creator;        /* creator OS */
    uint32_t  e2fs_rev;        /* revision level */
    uint16_t  e2fs_ruid;        /* default uid for reserved blocks */
    uint16_t  e2fs_rgid;        /* default gid for reserved blocks */
    /* EXT2_DYNAMIC_REV superblocks */
    uint32_t  e2fs_first_ino;    /* first non-reserved inode */
    uint16_t  e2fs_inode_size;    /* size of inode structure */
    uint16_t  e2fs_block_group_nr;    /* block grp number of this sblk*/
    uint32_t  e2fs_features_compat;    /* compatible feature set */
    uint32_t  e2fs_features_incompat; /* incompatible feature set */
    uint32_t  e2fs_features_rocompat; /* RO-compatible feature set */
    uint8_t      e2fs_uuid[16];    /* 128-bit uuid for volume */
    char      e2fs_vname[16];    /* volume name */
    char      e2fs_fsmnt[64];    /* name mounted on */
    uint32_t  e2fs_algo;        /* For compression */
    uint8_t   e2fs_prealloc;    /* # of blocks for old prealloc */
    uint8_t   e2fs_dir_prealloc;    /* # of blocks for old prealloc dirs */
    uint16_t  e2fs_reserved_ngdb;    /* # of reserved gd blocks for resize */
    char      e3fs_journal_uuid[16]; /* uuid of journal superblock */
    uint32_t  e3fs_journal_inum;    /* inode number of journal file */
    uint32_t  e3fs_journal_dev;    /* device number of journal file */
    uint32_t  e3fs_last_orphan;    /* start of list of inodes to delete */
    uint32_t  e3fs_hash_seed[4];    /* HTREE hash seed */
    char      e3fs_def_hash_version;/* Default hash version to use */
    char      e3fs_jnl_backup_type;
    uint16_t  e3fs_desc_size;    /* size of group descriptor */
    uint32_t  e3fs_default_mount_opts;
    uint32_t  e3fs_first_meta_bg;    /* First metablock block group */
    uint32_t  e3fs_mkfs_time;    /* when the fs was created */
    uint32_t  e3fs_jnl_blks[17];    /* backup of the journal inode */
    uint32_t  e4fs_bcount_hi;    /* high bits of blocks count */
    uint32_t  e4fs_rbcount_hi;    /* high bits of reserved blocks count */
    uint32_t  e4fs_fbcount_hi;    /* high bits of free blocks count */
    uint16_t  e4fs_min_extra_isize; /* all inodes have some bytes */
    uint16_t  e4fs_want_extra_isize;/* inodes must reserve some bytes */
    uint32_t  e4fs_flags;        /* miscellaneous flags */
    uint16_t  e4fs_raid_stride;    /* RAID stride */
    uint16_t  e4fs_mmpintv;        /* seconds to wait in MMP checking */
    uint64_t  e4fs_mmpblk;        /* block for multi-mount protection */
    uint32_t  e4fs_raid_stripe_wid; /* blocks on data disks (N * stride) */
    uint8_t   e4fs_log_gpf;        /* FLEX_BG group size */
    uint8_t   e4fs_chksum_type;    /* metadata checksum algorithm used */
    uint8_t   e4fs_encrypt;        /* versioning level for encryption */
    uint8_t   e4fs_reserved_pad;
    uint64_t  e4fs_kbytes_written;    /* number of lifetime kilobytes */
    uint32_t  e4fs_snapinum;    /* inode number of active snapshot */
    uint32_t  e4fs_snapid;        /* sequential ID of active snapshot */
    uint64_t  e4fs_snaprbcount;    /* reserved blocks for active snapshot */
    uint32_t  e4fs_snaplist;    /* inode number for on-disk snapshot */
    uint32_t  e4fs_errcount;    /* number of file system errors */
    uint32_t  e4fs_first_errtime;    /* first time an error happened */
    uint32_t  e4fs_first_errino;    /* inode involved in first error */
    uint64_t  e4fs_first_errblk;    /* block involved of first error */
    uint8_t   e4fs_first_errfunc[32];/* function where error happened */
    uint32_t  e4fs_first_errline;    /* line number where error happened */
    uint32_t  e4fs_last_errtime;    /* most recent time of an error */
    uint32_t  e4fs_last_errino;    /* inode involved in last error */
    uint32_t  e4fs_last_errline;    /* line number where error happened */
    uint64_t  e4fs_last_errblk;    /* block involved of last error */
    uint8_t   e4fs_last_errfunc[32]; /* function where error happened */
    uint8_t   e4fs_mount_opts[64];
    uint32_t  e4fs_usrquota_inum;    /* inode for tracking user quota */
    uint32_t  e4fs_grpquota_inum;    /* inode for tracking group quota */
    uint32_t  e4fs_overhead_clusters;/* overhead blocks/clusters */
    uint32_t  e4fs_backup_bgs[2];    /* groups with sparse_super2 SBs */
    uint8_t   e4fs_encrypt_algos[4];/* encryption algorithms in use */
    uint8_t   e4fs_encrypt_pw_salt[16];/* salt used for string2key */
    uint32_t  e4fs_lpf_ino;        /* location of the lost+found inode */
    uint32_t  e4fs_proj_quota_inum;    /* inode for tracking project quota */
    uint32_t  e4fs_chksum_seed;    /* checksum seed */
    uint32_t  e4fs_reserved[98];    /* padding to the end of the block */
    uint32_t  e4fs_sbchksum;    /* superblock checksum */
};

这个数据结构比拟大,囊括了文件系统各种状态信息,咱们先看几个比拟罕用属性:

  • e2fs_icount:总的 inode 数量
  • e2fs_bcount:总的磁盘块数量
  • e2fs_rbcount:ext2fs 会保留一些数据块用于应答磁盘块损坏等等状况
  • e2fs_fbcount:磁盘中残余可用磁盘块数量
  • e2fs_ficount:可用的 inode 的数量
  • e2fs_bpg:每个 块组 (block group,用于更加高效的治理和应用磁盘块而对整个磁盘进行分组) 中蕴含的磁盘块数量
  • e2fs_fpg:每个块组中 (能够看做是肯定数量磁盘块的汇合,用于进步数据读写效率) 的数量
  • e2fs_first_ino:文件系统第一个可用的 inode number (ext2 会保留一些 inode number 用于应答某些非凡情景)
  • e2fs_inode_size: inode 构造体在磁盘上所占用的空间大小

这些成员基本上就可反映出磁盘设施总体的应用状况。

未完待续

忽然,Douyiya 的手机响了,原来是室友号召他快点回去爬天梯。于是乎,“明天就先分享到这里把,你们能够回去再相熟一下源码,能理解的更加全面。” Douyiya 说到。NanamiNanase 表示同意,随后三人便一起来到了图书馆。回去路上,Douyiya 为了让她们可能更有针对性的看代码,便又提出了两个小思考题,并且让她们下次再来的时候答复一下:

e2fs_icount 是如何失去的?
文件系统如何去判断 inode 和磁盘块曾经被应用或者未被应用?

【腾讯云】云产品限时秒杀,爆款1核2G云服务器,首年50元

阿里云限时活动-2核2G-5M带宽-60G SSD-1000G月流量 ,特惠价99元/年(原价1234.2元/年,可以直接买3年),速抢

本文由乐趣区整理发布,转载请注明出处,谢谢。

您可能还喜欢...

发表回复

您的电子邮箱地址不会被公开。

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据