本文整顿自:袋鼠云技术荟 | SQL优化案例(2):OR条件优化
数栈是云原生—站式数据中台PaaS,咱们在github上有一个乏味的开源我的项目:https://github.com/DTStack/flinkx
FlinkX是一个基于Flink的批流对立的数据同步工具,既能够采集动态的数据,比方MySQL,HDFS等,也能够采集实时变动的数据,比方MySQL binlog,Kafka等,是全域、异构、批流一体的数据同步引擎,大家如果有趣味,欢送来github社区找咱们玩~
在MySQL中,同样的查问条件,如果变换OR在SQL语句中的地位,那么查问的后果也会有差别,在较为简单的状况下,可能会带来索引抉择不佳的性能隐患,为了防止执行效率大幅度降落的问题,咱们能够适当思考应用Union all 对查问逻辑较为简单的SQL进行拆散。
常见OR应用场景,请浏览以下案例:
案例一:不同列应用OR条件查问
1. 待优化场景
SELECT.... FROM`t1` a WHERE a.token= '16149684' AND a.store_id= '242950' AND(a.registrationId IS NOT NULL AND a.registrationId<> '') OR a.uid= 308475 AND a.registrationId IS NOT NULL AND a.registrationId<> ''
执行打算
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+| id | select_type | table | type | key | key_len | ref | rows | Extra |+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+| 1 | SIMPLE | a | range |idx_registrationid | 99 | | 100445 | Using index condition; Using where |+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
共返回1 行记录,破费 5 ms。
2. 场景解析
从查问条件中能够看出 token 和 uid 过滤性都十分好,然而因为应用了 or, 须要采纳 index merge 的办法能力取得比拟好的性能。但在理论执行过程中MySQL优化器默认抉择了应用registrationId 上的索引,导致 SQL 的性能很差。
3. 场景优化
咱们将SQL改写成union all的模式。
SELECT......FROM`t1` aWHERE a.token = '16054473'AND a.store_id = '138343'AND b.is_refund = 1AND (a.registrationId IS NOT NULLAND a.registrationId <> '')union allSELECT......FROM`t1` awhere a.uid = 181579AND a.registrationId IS NOT NULLAND a.registrationId <> ''
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+| 1 | PRIMARY | a | ref | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN | 63 | const | 1 | Using index condition; Using where || 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 4 | youdian_life_sewsq.a.role_id | 1 | Using where || 2 | UNION | a | const | PRIMARY | PRIMARY | 4 | const | 1 | || 2 | UNION | b | const | PRIMARY | PRIMARY | 4 | const | 0 | unique row not found || | UNION RESULT | <union1,2> | ALL | | | | | | Using temporary |+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
共返回5 行记录,破费 5 ms。
通过比照优化前后的执行打算,能够显著看出,将SQL拆分成两个子查问,再应用union对后果进行合并,稳定性和安全性更好,性能更高。
案例二:同一列应用OR查问条件
1. 待优化场景
select........fromt1 as mcileft join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3left join( select product_id, count(0) count from t2 pprod inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id and pprod.is_enable =1 and ppinfo.is_enable=1 and pinfo.belong_t0 =1 and pinfo.end_time >=now() and not ( pinfo.onshelv_time>'2019-06-30 00:00:00' or pinfo.end_time>'2018-12-05 00:00:00' )group by pprod.product_id)as pc on pc.product_id = mci.product_idwhere mci.is_enable =0and mci.comodifty_type in ('1', '5', '6')and (pc.count =0 or pc.count isnull ) limit 0,5;
执行打算
2. 场景解析
本例的SQL查问中有一个子查问,子查问被当成驱动表,产生了auto_key,通过SQL拆分进行测试,验证次要是(pc.count =0 , or pc.count is null )会影响到整个SQL的性能,须要进行比拟改写。
3. 场景优化
首先咱们能够独自思考(pc.count =0 , or pc.count is null ) 如何进行优化?先写一个相似的SQL
Select col from test where col =100 or col is null;+--------+| col |+--------+| 100 || NULL |+--------+2 rows in set (0.00 sec)
这个时候咱们看到的其实是同一个列,但对应不同的值,这种状况能够利用case when进行转换。
Select col From test where case when col is null then 100 else col =100 end;+--------+| col |+--------+| 100 || NULL |+--------+2 rows in set (0.00 sec)
再回到原始SQL进行改写。
select........fromt1 as mcileft join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3left join( select product_id, count(0) count from t2 pprod inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id and pprod.is_enable =1 and ppinfo.is_enable=1 and pinfo.belong_t0 =1 and pinfo.end_time >=now() and not ( pinfo.onshelv_time>'2019-06-30 00:00:00' or pinfo.end_time>'2018-12-05 00:00:00' )group by pprod.product_id)as pc on pc.product_id = mci.product_idwhere mci.is_enable =0and mci.comodifty_type in ('1', '5', '6')and case when pc.count is null then 0 else pc.count end=0 limit 0,5;
能够看出优化后的SQL比原始SQL快了30秒,执行效率晋升约50倍。
案例三:优化关联SQL OR条件
1. 待优化场景
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …FROM user_msgLEFT JOIN user ON user_msg.user_id = user.user_idLEFT JOIN group ON user_msg.group_id = group.group_idWHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL30SECOND)OR user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)OR group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
2.场景解析
咱们仔细分析上述查问语句,发现尽管业务逻辑只须要查问半分钟内批改的数据,但执行过程却必须对所有的数据进行关联操作,带来不必要的性能损耗。
3.场景优化
咱们对原始SQL进行拆分操作,第一局部sql-01如下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …FROM user_msgLEFT JOIN user ON user_msg.user_id = user.user_idLEFT JOIN group ON user_msg.group_id = group.group_idWHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-01以user_msg 表为驱动,应用gmt_modified 索引过滤最新数据。
第二局部sql-02如下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …FROM user_msgLEFT JOIN user ON user_msg.user_id = user.user_idLEFT JOIN group ON user_msg.group_id = group.group_idWHERE user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
ql-02以user为驱动表,msg user_id 的索引过滤行很好。
第三局部sql-03如下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …FROM user_msgLEFT JOIN user ON user_msg.user_id = user.user_idLEFT JOIN group ON user_msg.group_id = group.group_idWHERE group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-03以group为驱动表,应用gmt_modified 索引过滤最新数据。
总结
MySQL OR条件优化的常见场景次要有以下状况:
1、雷同列能够应用IN进行代替
2、不同列及简单的状况下,能够应用union all 进行拆散
3、关联SQL OR条件
咱们须要结合实际场景,剖析优化。