本文首发于 2020-05-05 21:55:15

一、前言

从接触MySQL开始断断续续的看过一些文章,对count()操作七嘴八舌,其中分歧点次要在于count(1)count(*)哪个效率高,有说count(1)count(*)快的(这种说法更广泛),有说二者一样快的。集体了解这两种行为可能实用于的是不同的版本,我只关怀较新的MySQL版本是什么行为,详见下文。

二、含意

首先,先阐明一下常见count()操作及含意:

count(*):计算包含NULL值在内的行数,SQL92定义的规范统计行数的语法。

count(1):计算包含NULL值在内的行数,其中的1是恒真表达式。

count(列名):计算指定列的行数,但不蕴含NULL值。

三、具体区别

MySQL手册中相干形容如下:

For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.

InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.

Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly.COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

官网这段形容要点如下:

  1. InnoDB是事务引擎,反对MVCC,并发事务可能同时“看到”不同的行数,所以,InnoDB不保留表中的行数SELECT COUNT(*)语句只计算以后事务可见的行数。
  2. 在MySQL 5.7.18之前,InnoDB通过扫描汇集索引解决SELECT COUNT(*)语句。从MySQL 5.7.18开始,InnoDB通过遍历最小的可用二级索引来解决SELECT COUNT(*)语句,除非索引或优化器明确批示应用不同的索引。如果不存在二级索引,则扫描汇集索引。这样的设计单从 IO 的角度就节俭了很多开销。
  3. InnoDB以同样的形式解决SELECT COUNT(*)SELECT COUNT(1)操作,没有性能差别。 因而,倡议应用合乎SQL规范的count(*)
  4. 对于MyISAM表,因为MyISAM引擎存储了准确的行数,因而,如果SELECT COUNT(*)语句不蕴含WHERE子句,则会很快返回。这个很好了解,如果带了where条件,就须要扫表了。
  5. 如果索引记录不齐全在缓冲池中,则解决SELECT(*)语句须要一些工夫。为了更快的计数,您能够创立一个计数器表,并让您的应用程序按插入和删除操作更新它。然而,这种办法在同一计数器表中启动成千上万个并发事务的状况下,可能无奈很好地扩大。如果一个近似的行数足够,能够应用SHOW TABLE STATUS查问行数。

到这里咱们明确了 count(*)count(1) 实质下面其实是一样的,那么 count(column) 又是怎么回事呢?

count(column) 也是会遍历整张表,然而不同的是它会拿到 column 的值当前判断是否为空,而后再进行累加,那么如果针对主键须要解析内容,如果是二级索引须要再次依据主键获取内容,则要多一次 IO 操作,所以 count(column) 的性能必定不如前两者,如果依照效率比拟的话:count(*)=count(1)>count(primary key)>count(非主键column)

四、倡议

基于以上形容,如果要查问innodb存储引擎的表的总行数,有如下倡议:

  1. 若仅仅是想获取大略的行数,倡议应用show table status或查问information_schema.tables

    mysql> use db6;Reading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -ADatabase changedmysql> show tables;+---------------+| Tables_in_db6 |+---------------+| t1            |
  2. row in set (0.01 sec)

    mysql> select count(*) from t1;
    count(*)
    2
  3. row in set (0.00 sec)

    mysql> show table status\G
    1. row **

        Name: t1  Engine: InnoDB Version: 10

    Row_format: Dynamic

        Rows: 2

    Avg_row_length: 8192
    Data_length: 16384
    Max_data_length: 0
    Index_length: 0
    Data_free: 0
    Auto_increment: NULL
    Create_time: 2020-04-21 12:00:44
    Update_time: NULL
    Check_time: NULL
    Collation: utf8mb4_general_ci

    Checksum: NULL

    Create_options:

     Comment:
  4. row in set (0.00 sec)

    mysql> select * from information_schema.tables where table_name = 't1'\G
    1. row **
    TABLE_CATALOG: def
    TABLE_SCHEMA: db6
    TABLE_NAME: t1
    TABLE_TYPE: BASE TABLE

      ENGINE: InnoDB VERSION: 10

    ROW_FORMAT: Dynamic
    TABLE_ROWS: 2
    AVG_ROW_LENGTH: 8192
    DATA_LENGTH: 16384
    MAX_DATA_LENGTH: 0
    INDEX_LENGTH: 0
    DATA_FREE: 0
    AUTO_INCREMENT: NULL
    CREATE_TIME: 2020-04-21 12:00:44
    UPDATE_TIME: NULL
    CHECK_TIME: NULL
    TABLE_COLLATION: utf8mb4_general_ci

    CHECKSUM: NULL

    CREATE_OPTIONS:
    TABLE_COMMENT:

  5. row in set (0.00 sec)

  6. 反之,如果必须要获取精确的总行数,倡议:

    1) 创立一个计数器表,并让您的应用程序按插入和删除操作更新它。
    2) 若业务插入和删除绝对较少,也能够思考缓存到 redis。

篇幅无限,深刻验证、源码剖析将在下一篇文章中介绍。


欢送关注我的微信公众号【数据库内核】:分享支流开源数据库和存储引擎相干技术。

题目网址
GitHubhttps://dbkernel.github.io
知乎https://www.zhihu.com/people/...
思否(SegmentFault)https://segmentfault.com/u/db...
掘金https://juejin.im/user/5e9d3e...
开源中国(oschina)https://my.oschina.net/dbkernel
博客园(cnblogs)https://www.cnblogs.com/dbkernel