共计 12131 个字符,预计需要花费 31 分钟才能阅读完成。
本文作者分享了 Oceanbase 工夫与数值类型隐式转换导致查问后果不合乎预期或“不正确”问题的排查思路。
作者:任仲禹
爱可生 DBA 团队成员,善于故障剖析和性能优化,文章相干技术问题,欢送大家一起探讨。
本文起源:原创投稿
- 爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。
之前在 OceanBase 应用中碰到了一个“令人费解”的数据类型隐式转换问题。论断比较简单,特跟大家分享下排查思路。
问题形容
某客户项目组执行更新 SQL 语句时会偶发失败,报错如下:
脱敏解决后
ERROR bad SQL grammar [update renzy set at=current_timestamp,expire_at=(cast(unix_timestamp(current_timestamp(3) as unsigned) +?)), order_id= ? where id = ? and (expire_at < current_timestamp or order_id = ?)] java.sql.SQLSyntaxErrorException: (conn=1277168) Incorrect value.
查问 OceanBase 版本。
./observer -V
observer (OceanBase 3.2.3.2)
REVISION: 105000092022092216-445151f0edb502e00ae5839dfd92627816b2b822
查看表构造和数据。
MySQL [test]> show create table renzy\G
*************************** 1. row ***************************
Table: renzy
Create Table: CREATE TABLE `renzy` (`id` varchar(64) NOT NULL,
`at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`order_id` varchar(64) NOT NULL,
`expire_at` bigint(20) NOT NULL,
`vt` timestamp NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0
MySQL [test]> select * from renzy;
+----+---------------------------+---------------+------------+---------------------------------+
| id | at | order_id | expire_at | vt |
+----+---------------------------+---------------+------------+---------------------------------+
| 1 | 2023-07-07 14:57:13 | 0:16632@172.1 | 1716040750 | 2023-07-07 14:57:13 |
+----+---------------------------+---------------+------------+---------------------------------+
1 row in set (0.02 sec)
问题排查
问题 1:报错语句
间接执行报错的 SQL 语句。
update renzy set at=CURRENT_TIMESTAMP, expire_at=(cast(unix_timestamp(current_timestamp(3)) as unsigned) + 30000000), order_id= '0:16632@172.24.64.1' where id = '1' and (expire_at < CURRENT_TIMESTAMP or order_id = '0:16632@172.24.64.1')
ERROR 1292 (22007): Incorrect values.
问题 2
然而,当主键 ID 匹配不到任何数据,UPDATE
语句将不会报错。
为啥?先记录下问题接着看日志。
# 表不存在 id=2 的数据
update renzy set acquired_at=CURRENT_TIMESTAMP, expire_at=(cast(unix_timestamp(current_timestamp(3)) as unsigned) + 30000000), order_id= '0:16632@172.24.64.1' where id = '2' and (expire_at < CURRENT_TIMESTAMP or order_id = '0:16632@172.24.64.1')
Query OK.
报错 SQL 的 sql_auit 输入和日志
带着问题剖析下 sql$gv_audit
输入,得出如下信息:
- 该语句打算是本地打算,散发到了 0.71 节点。
- 错误码是 4219(有效的
datetime
值)。
MySQL [oceanbase]> select trace_id,svr_ip,ret_code,retry_cnt,usec_to_time(request_time),elapsed_time,execute_time,plan_type,query_sql from gv$sql_audit where query_sql like 'update id%' and ret_code != 0 order by request_time desc limit 5;
+-----------------------------------+-------------+----------+-----------+----------------------------+--------------+--------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace_id | svr_ip | ret_code | retry_cnt | usec_to_time(request_time) | elapsed_time | execute_time | plan_type | query_sql |
+-----------------------------------+-------------+----------+-----------+----------------------------+--------------+--------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| YB420CF10047-0005FBCCEF6E3635-0-0 | 12.241.0.71 | -4219 | 0 | 2023-06-06 15:32:08.375051 | 689 | 611 | 1 | update id set acquired_at=CURRENT_TIMESTAMP, expire_at=(cast(unix_timestamp(current_timestamp(3)) as unsigned) + 30000000), order_id= '0:16632@172.24.64.1' where id = '2' and (expire_at < CURRENT_TIMESTAMP or order_id = '0:16632@172.24.64.1') |
·····
5 rows in set (5.21 sec)
要害报错日志
0.72 的 obs.log
。
#grep YB420CF10047-0005FBCCEF6E3635-0-0 observer.log.20230606153309
[2023-06-06 15:32:08.375202] WARN [LIB.TIME] int_to_ob_time_with_date (ob_time_convert.cpp:1618) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=5] [dc=0] datetime is invalid or out of range(ret=-4219, int64=0)
[2023-06-06 15:32:08.375211] WARN [LIB.TIME] int_to_datetime (ob_time_convert.cpp:329) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=8] [dc=0] failed to convert integer to datetime(ret=-4219)
[2023-06-06 15:32:08.375214] WARN [SQL] common_int_datetime (ob_datum_cast.cpp:709) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=3] [dc=0] int_datetime failed(ret=-4219)
[2023-06-06 15:32:08.375219] WARN [SQL] int_datetime (ob_datum_cast.cpp:2076) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=4] [dc=0] fail to exec common_int_datetime(expr, in_val, ctx, res_datum)(ret=-4219, in_val=1716036728)
[2023-06-06 15:32:08.375223] WARN [SQL] get_comparator_operands (ob_expr_operator.h:1131) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=4] [dc=0] left eval failed(ret=-4219)
[2023-06-06 15:32:08.375227] WARN [SQL.ENG] def_relational_eval_func (ob_expr_cmp_func.cpp:54) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=4] [dc=0] failed to eval args(ret=-4219)
[2023-06-06 15:32:08.375232] WARN [SQL.ENG] calc_or_exprN (ob_expr_or.cpp:221) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=5] [dc=0] eval arg 0 failed(ret=-4219)
[2023-06-06 15:32:08.375240] WARN [SQL.ENG] filter_row (ob_operator.cpp:915) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=7] [dc=0] expr evaluate failed(ret=-4219, expr=0x7f37808ec058)
[2023-06-06 15:32:08.375247] WARN [STORAGE] check_filtered (ob_multiple_merge.cpp:1249) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=7] [dc=0] filter row failed(ret=-4219)
[2023-06-06 15:32:08.375251] WARN [STORAGE] process_fuse_row (ob_multiple_merge.cpp:787) [38763][0][YB420CF10047-0005FBCCEF6E3635-0-0] [lt=3] [dc=0] fail to check row filtered(ret=-4219)
察看到最开始抛出谬误的是 int_to_ob_time_with_date
办法,报的错是 “datetime is invalid or out of range(ret=-4219, int64=0)”,即值有效或超过范畴。
这里 UPDATE
的调用链路是:
common_int_datetime -> int_to_datetime -> int_to_ob_time_with_date
问题 3:尝试应用 SELECT 查问后果
表中 EXPIRE_AT
存储的是将来工夫(1716040750),与以后工夫(1686042749)做比拟,查问后果理当不输入后果才对?(先记录为问题 3:查问后果不合乎预期)
# 表中只有 1 行记录,且 EXPIRE_AT 的值为 1716040750
MySQL [mock_db]> select * from renzy where EXPIRE_AT < CURRENT_TIMESTAMP;
+-------+---------------------+---------------------+------------+------------------------------------+
| id | ACQUIRED_AT | order_id | EXPIRE_AT | LST_UP_TM |
+-------+---------------------+---------------------+------------+------------------------------------+
| 1 | 2023-06-06 16:39:10 | 0:16632@172.24.64.1 | 1716040750 | 2023-06-06 16:39:10 |
+-------+---------------------+---------------------+------------+------------------------------------+
1 row in set (0.05 sec)
# 以后工夫戳
MySQL [mock_db]> select unix_timestamp(CURRENT_TIMESTAMP);
+--------------------------------------------+
| unix_timestamp(CURRENT_TIMESTAMP) |
+--------------------------------------------+
| 1686042749 |
+--------------------------------------------+
1 row in set (0.03 sec)
SELECT
查问的报错日志。
[2023-06-06 17:08:54.307371] WARN [LIB.TIME] int_to_ob_time_with_date (ob_time_convert.cpp:1618) [38763][0][YB420CF10047-0005FBCCEF6F9E6D-0-0] [lt=10] [dc=0] datetime is invalid or out of range(ret=-4219, int64=0)
[2023-06-06 17:08:54.307382] WARN [LIB.TIME] int_to_datetime (ob_time_convert.cpp:329) [38763][0][YB420CF10047-0005FBCCEF6F9E6D-0-0] [lt=10] [dc=0] failed to convert integer to datetime(ret=-4219)
这里 SELECT
的调用链路:int_to_datetime -> int_to_ob_time_with_date
以上就是存疑的几个问题,那在具体分析前,先理解一下前置知识点:OceanBase 的隐式转换。
OceanBase 的隐式转换
数据类型 bigint
与 datetime
的值是没法间接比拟,须要先将 int
转换为工夫类型,这就是所谓的隐式转换,所以这里 OceanBase 是如何转很重要。
int
类型转换成 OceanBase 认可的工夫类型(即 OBTime)并不是用的from_unixtime
这个函数,而是用的 OceanBase 本人外部的逻辑。- 源码中波及
int
,double
,string
类型隐式转换的逻辑如下:
int_to_datetime
////////////////////////////////
// int / double / string -> datetime / date / time / year.
int ObTimeConverter::int_to_datetime(int64_t int_part, int64_t dec_part,
const ObTimeConvertCtx &cvrt_ctx, int64_t &value,
const ObDateSqlMode date_sql_mode)
{
int ret = OB_SUCCESS;
dec_part = (dec_part + 500) / 1000;
if (0 == int_part) {value = ZERO_DATETIME;} else {ObTime ob_time(DT_TYPE_DATETIME);
ObDateSqlMode local_date_sql_mode = date_sql_mode;
if (cvrt_ctx.is_timestamp_) {local_date_sql_mode.allow_invalid_dates_ = false;}
if (OB_FAIL(int_to_ob_time_with_date(int_part, ob_time, false, local_date_sql_mode))) {LOG_WARN("failed to convert integer to datetime", K(ret));
} else if (OB_FAIL(ob_time_to_datetime(ob_time, cvrt_ctx, value))) {LOG_WARN("failed to convert datetime to seconds", K(ret));
}
}
value += dec_part;
if (OB_SUCC(ret) && !is_valid_datetime(value)) {
ret = OB_DATETIME_FUNCTION_OVERFLOW;
LOG_WARN("datetime filed overflow", K(ret), K(value));
}
return ret;
}
最终调用的 int_to_ob_time_with_date
:
////////////////////////////////
// int / uint / string -> ObTime / ObInterval <- datetime / date / time.
int ObTimeConverter::int_to_ob_time_with_date(int64_t int64, ObTime &ob_time, bool is_dayofmonth,
const ObDateSqlMode date_sql_mode)
{
int ret = OB_SUCCESS;
int32_t *parts = ob_time.parts_;
if (is_dayofmonth && 0 == int64) {parts[DT_SEC] = 0;
parts[DT_MIN] = 0;
parts[DT_HOUR] = 0;
parts[DT_MDAY] = 0;
parts[DT_MON] = 0;
parts[DT_YEAR] = 0;
} else if (int64 < power_of_10[2]) {
ret = OB_INVALID_DATE_VALUE;
LOG_WARN("datetime integer is out of range", K(ret), K(int64));
} else if (int64 < power_of_10[8]) {
// YYYYMMDD.
parts[DT_MDAY] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_MON] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_YEAR] = static_cast<int32_t>(int64 % power_of_10[4]);
} else if (int64 / power_of_10[6] < power_of_10[8]) {
// YYYYMMDDHHMMSS.
parts[DT_SEC] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_MIN] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_HOUR] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_MDAY] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_MON] = static_cast<int32_t>(int64 % power_of_10[2]); int64 /= power_of_10[2];
parts[DT_YEAR] = static_cast<int32_t>(int64 % power_of_10[4]);
} else {
ret = OB_INVALID_DATE_VALUE;
LOG_WARN("datetime integer is out of range", K(ret), K(int64));
}
if (OB_SUCC(ret)) {apply_date_year2_rule(parts[0]);
if (OB_FAIL(validate_datetime(ob_time, is_dayofmonth, date_sql_mode))) {LOG_WARN("datetime is invalid or out of range", K(ret), K(int64));
} else if (ZERO_DATE != parts[DT_DATE]) {parts[DT_DATE] = ob_time_to_date(ob_time);
}
}
return ret;
}
下面代码示意 OceanBase 仅能辨认的格局:
- YYYYMMDD
- YYYYMMDDHHMMSS
一旦不是上述格局就会抛出报错。下图能够看到只有 20230816
和 20230819111111
两种符合条件。
问题起因
问题 3:SELECT 查出的后果不合乎预期
bigint
与 datetime
类型 ” 比拟 ” 波及隐式转换导致后果不可预知。
select * from renzy where EXPIRE_AT < CURRENT_TIMESTAMP;
中EXPIRE_AT
的值是bigint
类型,且值为1716040750
。- 值
1716040750
无奈匹配int_to_ob_time_with_date
规定的格局,将抛出告警“datetime integer is out of range”,日志中也能印证这一点。 -
这里按理应该在 SQL 执行时抛出报错,不应该输入后果的,然而为了 OB 兼容 MySQL 而抉择输入了“谬误”的值。
- 为啥 MySQL 不会报错? MySQL 中转换失败之后,其实会有默认值
0
,上述 OB 行为转换失败用的默认值,应该和 MySQL 兼容,所以满足了WHERE
条件0 < 1686042749
,SELECT
就输入了后果。
- 为啥 MySQL 不会报错? MySQL 中转换失败之后,其实会有默认值
问题 1:UPDATE 语句为何能吐出报错
- 因为 OB 默认开启了
SQL_MODE
严格模式,如果产生隐式转换且转换失败(用了默认值)的场景,OB 的严格模式比 MySQL 做了一层防备,将禁止 SQL 执行。 - MySQL 开启
SQL_MODE
严格模式下能执行“胜利”。
问题 2:UPDATE 语句 WHERE 条件中主键匹配到不存在的值不报错
UPDATE
走的是table get
算子,等值查问不到后果后,不须要再filter
前面的条件。这解释了项目组偶发失败的问题。
最初一个问题:为啥 UPDATE 间接报错,SELECT 确能查出“谬误的值”
此报错仅抛出报错到 obs.log
。
我这边猜想是因为只在 UPDATE
的时候会遵循严格模式,而 SELECT
则不须要遵循,根据前文提到的调用链路:
- UPDATE: common_int_datetime -> int_to_datetime -> int_to_ob_time_with_date
- SELECT: int_to_datetime -> int_to_ob_time_with_date
UPDATE
多了下层的 common_int_datetime
,调用入口不一样,而依据这个办法,大抵猜想是因为 SQL_MODE
的严格模式导致。
验证下猜测,看起来清空 SQL_MODE
后,UPDATE
果然能插入胜利。
MySQL [test]> select @@sql_mode;
+--------------------------------------------+
| @@sql_mode |
+--------------------------------------------+
| STRICT_ALL_TABLES,NO_ZERO_IN_DATE |
+--------------------------------------------+
1 row in set (0.01 sec)
MySQL [test]> update renzy set at=CURRENT_TIMESTAMP, expire_at=(cast(unix_timestamp(current_timestamp(3)) as unsigned) + 30000000), order_id= '0:16632@172.24.64.1' where id = '1' and (expire_at < CURRENT_TIMESTAMP or order_id = '0:16632@172.24.64.1');
ERROR 1292 (22007): Incorrect value
MySQL [test]> set sql_mode='';
Query OK, 0 rows affected (0.01 sec)
MySQL [test]> update renzy set at=CURRENT_TIMESTAMP, expire_at=(cast(unix_timestamp(current_timestamp(3)) as unsigned) + 30000000), order_id= '0:16632@172.24.64.1' where id = '1' and (expire_at < CURRENT_TIMESTAMP or order_id = '0:16632@172.24.64.1');
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
可能我的猜想有问题,但集体能力无限,欢送大家一起斧正和探讨。
论断
-
让客户项目组改写 SQL 逻辑。
- EXPIRE_AT < CURRENT_TIMESTAMP 改成
- EXPIRE_AT < unix_timestamp(CURRENT_TIMESTAMP)
-
强调 SQL 标准,防止隐式转换。
对于 SQLE
爱可生开源社区的 SQLE 是一款面向数据库使用者和管理者,反对多场景审核,反对标准化上线流程,原生反对 MySQL 审核且数据库类型可扩大的 SQL 审核工具。
SQLE 获取
类型 | 地址 |
---|---|
版本库 | https://github.com/actiontech/sqle |
文档 | https://actiontech.github.io/sqle-docs/ |
公布信息 | https://github.com/actiontech/sqle/releases |
数据审核插件开发文档 | https://actiontech.github.io/sqle-docs-cn/3.modules/3.7_audit… |