共计 4210 个字符,预计需要花费 11 分钟才能阅读完成。
明天借助一个例子持续讲 ASH 的用法。
客户报告:一个 RAC 形成的环境,在 11:20 左右开始解决慢。提供材料:AWR Report | |
ASH | |
请求事项:起因确认 | |
解决办法 |
先简略看一 AWR Report 的总体情况。
・ Node1 | |
DB Time: 967.74 (mins) | |
・ Node2 | |
DB Time: 414.41 (mins) | |
・ Node3 | |
DB Time: 354.11 (mins) | |
・ Node4 | |
DB Time: 460.29 (mins) | |
・ Node5 | |
DB Time: 551.66 (mins) |
依据以上的信息,能够看到 Node1 的“DB Time : 967.74 (mins)”,Node2–Node5 的两倍左右。
所以,咱们能够看看每个 INSTANCE 的“Top 5 Timed Foreground Events”。
・ Node1 | |
Top 5 Timed Foreground Events | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Avg | |
wait % DB | |
Event Waits Time(s) (ms) time Wait Class | |
------------------------------ ------------ ----------- ------ ------ ---------- | |
db file sequential read 4,893,299 26,714 5 46.0 User I/O | |
log file sync 476,854 6,132 13 10.6 Commit | |
DB CPU 5,009 8.6 | |
Disk file operations I/O 163,640 4,128 25 7.1 User I/O | |
gc current block 3-way 2,866,969 3,163 1 5.4 Cluster | |
・ Node2 | |
Top 5 Timed Foreground Events | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Avg | |
wait % DB | |
Event Waits Time(s) (ms) time Wait Class | |
------------------------------ ------------ ----------- ------ ------ ---------- | |
gc cr block busy 256,891 4,668 18 18.8 Cluster | |
DB CPU 4,613 18.6 | |
db file sequential read 3,089,328 3,822 1 15.4 User I/O | |
gc current block 3-way 2,533,718 2,717 1 10.9 Cluster | |
gc cr grant 2-way 2,424,954 1,698 1 6.8 Cluster | |
・ Node3 | |
Top 5 Timed Foreground Events | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Avg | |
wait % DB | |
Event Waits Time(s) (ms) time Wait Class | |
------------------------------ ------------ ----------- ------ ------ ---------- | |
DB CPU 4,368 20.6 | |
gc cr block busy 241,547 4,166 17 19.6 Cluster | |
db file sequential read 2,272,733 2,652 1 12.5 User I/O | |
gc current block 3-way 2,123,690 2,157 1 10.2 Cluster | |
Disk file operations I/O 227,537 1,604 7 7.5 User I/O | |
・ Node4 | |
Top 5 Timed Foreground Events | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Avg | |
wait % DB | |
Event Waits Time(s) (ms) time Wait Class | |
------------------------------ ------------ ----------- ------ ------ ---------- | |
db file sequential read 5,651,562 6,052 1 21.9 User I/O | |
DB CPU 5,315 19.2 | |
gc cr block busy 195,097 3,457 18 12.5 Cluster | |
gc cr grant 2-way 4,318,113 2,704 1 9.8 Cluster | |
gc current block 3-way 2,451,795 2,571 1 9.3 Cluster | |
・ Node5 | |
Top 5 Timed Foreground Events | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Avg | |
wait % DB | |
Event Waits Time(s) (ms) time Wait Class | |
------------------------------ ------------ ----------- ------ ------ ---------- | |
db file sequential read 5,631,340 6,850 1 20.7 User I/O | |
DB CPU 5,586 16.9 | |
gc cr block busy 230,530 4,129 18 12.5 Cluster | |
gc current block 3-way 3,321,250 3,498 1 10.6 Cluster | |
gc cr grant 2-way 4,412,873 2,792 1 8.4 Cluster |
看到这里,因为所有 INSTANCE 的待机 EVENT 都和 I / O 关联,基本上能够判断为是业务解决集中的问题。
上面就须要仔细分析 ASH 数据,找到能证实论断的货色。
首先,咱们看一下分钟单位的 Active Session 数。
◆SQL 文 | |
SQL> select to_char(sample_time,'yyyy/mm/dd hh24:mi'),count(*) | |
from m_dba_hist_active_sess_history | |
where instance_number=<&instance_number> | |
group by to_char(sample_time,'yyyy/mm/dd hh24:mi') | |
order by to_char(sample_time,'yyyy/mm/dd hh24:mi'); |
通过下面曲线图,咱们能够清晰的看到在 11:18,所有 INSTANCE 的 Active Session 数都有不同水平的减少。
而后,针对增长水平最大的 Node1 进行进一步的剖析。先来看看 Active Session 的 PROGRAM 散布状况。
通过简略的剖析,咱们能够晓得通过 sqlplus 和 JDBC Thin Client 连上来的 SESSION 数量最大,并且增长幅度最大。
SQL> select to_char(sample_time,'yyyy/mm/dd hh24:mi'),count(*) | |
from m_dba_hist_active_sess_history | |
where instance_number=1 | |
and PROGRAM like 'sqlplus%' | |
group by to_char(sample_time,'yyyy/mm/dd hh24:mi') | |
having count(*)>10 | |
order by to_char(sample_time,'yyyy/mm/dd hh24:mi'); | |
TO_CHAR(SAMPLE_TIME,'YYYY/MM/DDHH24:MI') COUNT(*) | |
------------------------------------------------ ----------。。。。。。2021/03/22 11:14 12 | |
2021/03/22 11:15 12 | |
2021/03/22 11:16 12 | |
2021/03/22 11:17 12 | |
2021/03/22 11:18 24 ★SESSION 数减少 1 倍 | |
2021/03/22 11:19 22 | |
2021/03/22 11:20 26 | |
2021/03/22 11:21 19 | |
2021/03/22 11:22 16 | |
2021/03/22 11:23 18 | |
2021/03/22 11:24 17 | |
2021/03/22 11:25 18 | |
2021/03/22 11:26 18 | |
2021/03/22 11:27 20 | |
2021/03/22 11:28 22 | |
2021/03/22 11:29 21 | |
2021/03/22 11:30 23。。。。。。SQL> select to_char(sample_time,'yyyy/mm/dd hh24:mi'),count(*) | |
2 from m_dba_hist_active_sess_history | |
3 where instance_number=1 | |
4 and PROGRAM like 'JDBC Thin Client' | |
5 group by to_char(sample_time,'yyyy/mm/dd hh24:mi') | |
having count(*)>10 | |
order by to_char(sample_time,'yyyy/mm/dd hh24:mi'); 6 7 | |
TO_CHAR(SAMPLE_TIME,'YYYY/MM/DDHH24:MI') COUNT(*) | |
------------------------------------------------ ----------。。。。。。2021/03/22 11:13 99 | |
2021/03/22 11:14 104 | |
2021/03/22 11:15 116 | |
2021/03/22 11:16 118 | |
2021/03/22 11:17 119 | |
2021/03/22 11:18 168 ★増加した | |
2021/03/22 11:19 100 | |
2021/03/22 11:20 100 | |
2021/03/22 11:21 137 | |
2021/03/22 11:22 121 | |
2021/03/22 11:23 134 | |
2021/03/22 11:24 113 | |
2021/03/22 11:25 108 | |
2021/03/22 11:26 119 | |
2021/03/22 11:27 89 | |
2021/03/22 11:28 107 | |
2021/03/22 11:29 100。。。。。。 |
到当初为止,基本上能够断定为 11:18 左右,通过 sqlplus 和 JDBC Thin Client 连上来的 SESSION 数过于集中,
引起了 I / O 解决过多,导致了这次问题的产生。
那这种问题应该如何解决呢?
有上面两个方向:
1. 扩散业务解决。2. 找到 I / O 多的具体解决,看看有没有 I / O 少的办法。
第 2 个方向波及 SQL TUNING 的常识,这里不再细说了。
2021/03/23 @ Dalian
正文完