关于mysql:特性介绍-MySQL-select-count-count1count列-详解1概念及区别

52次阅读

共计 4209 个字符,预计需要花费 11 分钟才能阅读完成。

本文首发于 2020-05-05 21:55:15

一、前言

从接触 MySQL 开始断断续续的看过一些文章,对 count() 操作七嘴八舌,其中分歧点次要在于 count(1)count(*)哪个效率高,有说 count(1)count(*)快的(这种说法更广泛),有说二者一样快的。集体了解这两种行为可能实用于的是不同的版本,我只关怀较新的 MySQL 版本是什么行为,详见下文。

二、含意

首先,先阐明一下常见 count() 操作及含意:

count(*):计算包含 NULL 值在内的行数,SQL92 定义的规范统计行数的语法。

count(1):计算包含 NULL 值在内的行数,其中的 1 是恒真表达式。

count(列名):计算指定列的行数,但不蕴含 NULL 值。

三、具体区别

MySQL 手册中相干形容如下:

For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.

InnoDB does not keep an internal count of rows in a table because concurrent transactions might“see”different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.

Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly.COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

官网这段形容要点如下:

  1. InnoDB 是事务引擎,反对 MVCC,并发事务可能同时“看到”不同的行数,所以,InnoDB 不保留表中的行数 SELECT COUNT(*) 语句只计算以后事务可见的行数。
  2. 在 MySQL 5.7.18 之前,InnoDB 通过 扫描汇集索引 解决 SELECT COUNT(*) 语句。从 MySQL 5.7.18 开始,InnoDB通过 遍历最小的可用二级索引 来解决 SELECT COUNT(*) 语句,除非索引或优化器明确批示应用不同的索引。如果不存在二级索引,则扫描汇集索引。这样的设计单从 IO 的角度就节俭了很多开销。
  3. InnoDB 以同样的形式解决 SELECT COUNT(*)SELECT COUNT(1)操作,没有性能差别。 因而,倡议应用合乎 SQL 规范的count(*)
  4. 对于 MyISAM 表,因为 MyISAM 引擎存储了准确的行数,因而,如果 SELECT COUNT(*) 语句不蕴含 WHERE 子句,则会很快返回。这个很好了解,如果带了 where 条件,就须要扫表了。
  5. 如果索引记录不齐全在缓冲池中,则解决 SELECT(*) 语句须要一些工夫。为了更快的计数,您能够创立一个计数器表,并让您的应用程序按插入和删除操作更新它。然而,这种办法在同一计数器表中启动成千上万个并发事务的状况下,可能无奈很好地扩大。如果一个近似的行数足够,能够应用 SHOW TABLE STATUS 查问行数。

到这里咱们明确了 count(*)count(1) 实质下面其实是一样的,那么 count(column) 又是怎么回事呢?

count(column) 也是会遍历整张表,然而不同的是它会 拿到 column 的值当前判断是否为空,而后再进行累加 ,那么如果 针对主键须要解析内容 ,如果是 二级索引须要再次依据主键获取内容,则要多一次 IO 操作,所以 count(column) 的性能必定不如前两者,如果依照效率比拟的话:count(*)=count(1)>count(primary key)>count(非主键 column)

四、倡议

基于以上形容,如果要查问 innodb 存储引擎的表的总行数,有如下倡议:

  1. 若仅仅是想获取大略的行数,倡议应用 show table status 或查问information_schema.tables

    mysql> use db6;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    mysql> show tables;
    +---------------+
    | Tables_in_db6 |
    +---------------+
    | t1            |
  2. row in set (0.01 sec)

    mysql> select count(*) from t1;
    count(*)
    2
  3. row in set (0.00 sec)

    mysql> show table status\G
    1. row **

        Name: t1
      Engine: InnoDB
     Version: 10

    Row_format: Dynamic

        Rows: 2

    Avg_row_length: 8192
    Data_length: 16384
    Max_data_length: 0
    Index_length: 0
    Data_free: 0
    Auto_increment: NULL
    Create_time: 2020-04-21 12:00:44
    Update_time: NULL
    Check_time: NULL
    Collation: utf8mb4_general_ci

    Checksum: NULL

    Create_options:

     Comment:
  4. row in set (0.00 sec)

    mysql> select * from information_schema.tables where table_name = ‘t1’\G
    1. row **
    TABLE_CATALOG: def
    TABLE_SCHEMA: db6
    TABLE_NAME: t1
    TABLE_TYPE: BASE TABLE

      ENGINE: InnoDB
     VERSION: 10

    ROW_FORMAT: Dynamic
    TABLE_ROWS: 2
    AVG_ROW_LENGTH: 8192
    DATA_LENGTH: 16384
    MAX_DATA_LENGTH: 0
    INDEX_LENGTH: 0
    DATA_FREE: 0
    AUTO_INCREMENT: NULL
    CREATE_TIME: 2020-04-21 12:00:44
    UPDATE_TIME: NULL
    CHECK_TIME: NULL
    TABLE_COLLATION: utf8mb4_general_ci

    CHECKSUM: NULL

    CREATE_OPTIONS:
    TABLE_COMMENT:

  5. row in set (0.00 sec)

  6. 反之,如果必须要获取精确的总行数,倡议:

    1) 创立一个计数器表,并让您的应用程序按插入和删除操作更新它。
    2) 若业务插入和删除绝对较少,也能够思考缓存到 redis。

篇幅无限,深刻验证、源码剖析将在下一篇文章中介绍。


欢送关注我的微信公众号【数据库内核】:分享支流开源数据库和存储引擎相干技术。

题目 网址
GitHub https://dbkernel.github.io
知乎 https://www.zhihu.com/people/…
思否(SegmentFault) https://segmentfault.com/u/db…
掘金 https://juejin.im/user/5e9d3e…
开源中国(oschina) https://my.oschina.net/dbkernel
博客园(cnblogs) https://www.cnblogs.com/dbkernel

正文完
 0