大表关联走hash? 案例:
---- 反正我执行过1个多小时,没有跑完
SELECT
a.id AS order_id ,b.s_id AS bill_id, d.id AS sub_order_id, d.deal_oper_id FROM EM_ORDER PARTITION(EM_ORDER_201611) A, M_101_ID_2_GID B, ER_ORDER_ORDER C, EM_ORDER D, EE_ORDER_PF_WORK E WHERE A.SPEC_ID = 3010200004 AND A.ID = B.T_ID AND A.STATUS_ID = 1000007 AND A.COMPLETE_TIME >= TO_DATE('2016-11-14 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and A.COMPLETE_TIME <= TO_DATE('2016-11-14 23:59:59', 'YYYY-MM-DD HH24:MI:SS') AND A.ID = C.A_ORDER_ID AND C.B_ORDER_ID = D.ID AND D.ID = E.ORDER_ID AND e.work_type_id = 1001411 AND ( d.deal_oper_id IS NULL OR (SELECT f_chk_idcard(x.identity_number) FROM dm_staff x WHERE x.id = d.deal_oper_id) = 0 );
各表表大小:以下这些表几乎都是分区表。 可惜的是分区表做的不是很合理, 都是没有用到分区裁剪功能,分区全扫描。
size_mb segement_name
4595.0625 EE_ORDER_PF_WORK
40159.0625 EM_ORDER 20059.0625 ER_ORDER_ORDER 20770.0625 M_101_ID_2_GID dm_staff 小表, 20万条数据。
执行计划:
1 Plan hash value: 3098839882 3 --------------------------------------------------------------------------------------------------------------------------------------------4 | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |5 --------------------------------------------------------------------------------------------------------------------------------------------6 | 0 | SELECT STATEMENT | | 1 | 130 | | 1218K (1)| 04:03:44 | | |7 |* 1 | FILTER | | | | | | | | |8 | 2 | NESTED LOOPS | | 21005 | 2666K| | 1166K (1)| 03:53:19 | | |9 | 3 | NESTED LOOPS | | 21791 | 2666K| | 1166K (1)| 03:53:19 | | |10|* 4 | HASH JOIN | | 21791 | 2234K| 17M| 1079K (1)| 03:35:53 | | |11| 5 | NESTED LOOPS | | 183K| 15M| | 907K (1)| 03:01:32 | | |12| 6 | NESTED LOOPS | | 183K| 15M| | 907K (1)| 03:01:32 | | |13| 7 | NESTED LOOPS | | 183K| 11M| | 358K (1)| 01:11:37 | | |14|* 8 | TABLE ACCESS BY GLOBAL INDEX ROWID| EM_ORDER | 106K| 3631K| | 39284 (1)| 00:07:52 | 8 | 8 |15|* 9 | INDEX RANGE SCAN | IDX_EM_ORDER_COMP_TIME | 44669 | | | 330 (0)| 00:00:04 | | |16|* 10 | INDEX RANGE SCAN | IDX_ER_ORDER_ORDER_OO | 2 | 56 | | 3 (0)| 00:00:01 | | |17|* 11 | INDEX UNIQUE SCAN | PK_EM_ORDER | 1 | | | 2 (0)| 00:00:01 | | |18| 12 | TABLE ACCESS BY GLOBAL INDEX ROWID | EM_ORDER | 1 | 26 | | 3 (0)| 00:00:01 | ROWID | ROWID |19| 13 | PARTITION RANGE ALL | | 10M| 161M| | 156K (1)| 00:31:22 | 1 | 10 |20|* 14 | TABLE ACCESS FULL | EE_ORDER_PF_WORK | 10M| 161M| | 156K (1)| 00:31:22 | 1 | 10 |21|* 15 | INDEX RANGE SCAN | IDX_101_T_ID | 1 | | | 3 (0)| 00:00:01 | | |22| 16 | TABLE ACCESS BY GLOBAL INDEX ROWID | M_101_ID_2_GID | 1 | 25 | | 4 (0)| 00:00:01 | ROWID | ROWID |23| 17 | TABLE ACCESS BY INDEX ROWID | DM_STAFF | 1 | 20 | | 3 (0)| 00:00:01 | | |24|* 18 | INDEX UNIQUE SCAN | PK_DM_STAFF | 1 | | | 2 (0)| 00:00:01 | | |25--------------------------------------------------------------------------------------------------------------------------------------------26 27 Predicate Information (identified by operation id):28 ---------------------------------------------------29 30 1 - filter( (SELECT "F_CHK_IDCARD"("X"."IDENTITY_NUMBER") FROM "QWZW_ER"."DM_STAFF" "X" WHERE "X"."ID"=:B1)=0)31 4 - access("D"."ID"="E"."ORDER_ID")32 8 - filter("A"."SPEC_ID"=3010200004 AND "A"."STATUS_ID"=1000007)33 9 - access("A"."COMPLETE_TIME">=TO_DATE(' 2016-11-16 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "A"."COMPLETE_TIME"<=TO_DATE(' 34 2016-11-16 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))35 filter(TBL$OR$IDX$PART$NUM("QWZW_ER"."EM_ORDER",0,1,0,ROWID)=8)36 10 - access("A"."ID"="C"."A_ORDER_ID")37 11 - access("C"."B_ORDER_ID"="D"."ID")38 14 - filter("E"."WORK_TYPE_ID"=1001411)39 15 - access("A"."ID"="B"."T_ID")40 filter("B"."T_ID" IS NOT NULL)41 18 - access("X"."ID"=:B1)-------------------------------------------------------------------------------------------------------------------------------------------- 问题分析: 1,看执行计划: 第一眼看见这个sql我会认为id=7到id=9有问题,id=14有问题。
id=9,表40G,虽然一个分区但是全局索引,索引很大的;大表回表再过滤是有问题的,看9的过滤条件,
查询的是一天的数据,这个表是按月分区的,存储一年的数据, 40/365=0.1G,也就是说每天0.1g,所以这里问题不大。 但还是建本地索引速度会更快一点。建的索引不合理,况且这个影响不是很大,暂且先忽略。看id=14,E表,4G的表,全表扫描很慢, 很多人想到能不能建索引? 答案是不能! "E"."WORK_TYPE_ID"=1001411数据很多,建索引也不走,就算走了索引估计还没全表快。
id=14和id=15走hash,有问题,E表4g的表,全表扫描 , 肯定需要大量时间。 我第一眼看到id=1 的地方 filter,谓词部分 后面有一个小表的查询, 这个地方肯定有性能问题的,于是在整个SQL优化完后, 我把这个filter去掉,居然发现查询时间差不多 于是我就没有管它。 2,看sql:这是5个表关联,看where条件部分,E表4g比较大,而过滤条件只有一个 e.work_type_id = 1001411, 你看这个SQL发现 E表的数据不再 select 后面的列出现,可以改成半连接的,而且半连接的效率要高一点。 于是我改成exists 改写为 exists( select 1 from EE_ORDER_PF_WORK E where D.ID = E.ORDER_ID AND e.work_type_id = 1001411 ) . 其实更多的时候我会改成in。 3 加hint 使执行计划总体走NL,这里让小表作为驱动表,走NL,一路驱动下去。为什么select这里要全走NL?因为看上面的执行计划已经分析 过了id=5到id=12虽然都是大表, 但是走的都是索引, 结合A表时间条件, 以及其他条件, 驱动表的数据量不是很多。 并且最终数据量也不是很多。改SQL:为
4,改写之后的执行计划:此时id=14已经走了NESTED LOOPS SEMI,而id=13已经自动走了索引。SELECT /*+ use_nl(a,b) use_nl(a,c) use_nl(c,d) leading(a) */
'开通单' AS spec_name, a.id AS order_id ,b.s_id AS bill_id, d.id AS sub_order_id, ---x.s_id task_id, d.deal_oper_idFROM EM_ORDER PARTITION(EM_ORDER_201611) A,
M_101_ID_2_GID B, ER_ORDER_ORDER C, EM_ORDER D WHERE A.SPEC_ID = 3010200004 AND A.ID = B.T_ID AND A.STATUS_ID = 1000007 AND A.COMPLETE_TIME >= TO_DATE('2016-11-18 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and A.COMPLETE_TIME <= TO_DATE('2016-11-18 23:59:59', 'YYYY-MM-DD HH24:MI:SS') AND A.ID = C.A_ORDER_ID AND C.B_ORDER_ID = D.ID AND exists( select /*+ nl_sj */ 1 from EE_ORDER_PF_WORK E where D.ID = E.ORDER_ID AND e.work_type_id = 1001411 ) AND ( d.deal_oper_id IS NULL OR (SELECT f_chk_idcard(x.identity_number) FROM dm_staff x WHERE x.id = d.deal_oper_id) = 0 ) ;
1 Plan hash value: 15762009992 3 -----------------------------------------------------------------------------------------------------------------------------------4 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |5 -----------------------------------------------------------------------------------------------------------------------------------6 | 0 | SELECT STATEMENT | | 3537 | 449K| 1372K (1)| 04:34:26 | | |7 |* 1 | FILTER | | | | | | | |8 | 2 | NESTED LOOPS | | 19483 | 2473K| 1372K (1)| 04:34:26 | | |9 | 3 | NESTED LOOPS | | 20212 | 2473K| 1372K (1)| 04:34:26 | | |10 | 4 | NESTED LOOPS SEMI | | 20212 | 2072K| 1291K (1)| 04:18:16 | | |11 | 5 | NESTED LOOPS | | 169K| 14M| 844K (1)| 02:48:57 | | |12 | 6 | NESTED LOOPS | | 169K| 10M| 334K (1)| 01:07:00 | | |13 |* 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| EM_ORDER | 98540 | 3368K| 39284 (1)| 00:07:52 | 8 | 8 |14 |* 8 | INDEX RANGE SCAN | IDX_EM_ORDER_COMP_TIME | 44669 | | 330 (0)| 00:00:04 | | |15 |* 9 | INDEX RANGE SCAN | IDX_ER_ORDER_ORDER_OO | 2 | 56 | 3 (0)| 00:00:01 | | |16 | 10 | TABLE ACCESS BY GLOBAL INDEX ROWID | EM_ORDER | 1 | 26 | 3 (0)| 00:00:01 | ROWID | ROWID |17 |* 11 | INDEX UNIQUE SCAN | PK_EM_ORDER | 1 | | 2 (0)| 00:00:01 | | |18 |* 12 | TABLE ACCESS BY GLOBAL INDEX ROWID | EE_ORDER_PF_WORK | 1257K| 19M| 3 (0)| 00:00:01 | ROWID | ROWID |19 |* 13 | INDEX UNIQUE SCAN | PK_EE_ORDER_PF_WORK | 1 | | 2 (0)| 00:00:01 | | |20 |* 14 | INDEX RANGE SCAN | IDX_101_T_ID | 1 | | 3 (0)| 00:00:01 | | |21 | 15 | TABLE ACCESS BY GLOBAL INDEX ROWID | M_101_ID_2_GID | 1 | 25 | 4 (0)| 00:00:01 | ROWID | ROWID |22 | 16 | TABLE ACCESS BY INDEX ROWID | DM_STAFF | 1 | 20 | 3 (0)| 00:00:01 | | |23 |* 17 | INDEX UNIQUE SCAN | PK_DM_STAFF | 1 | | 2 (0)| 00:00:01 | | |24 -----------------------------------------------------------------------------------------------------------------------------------25 26 Predicate Information (identified by operation id):27 ---------------------------------------------------28 29 1 - filter("D"."DEAL_OPER_ID" IS NULL OR (SELECT "F_CHK_IDCARD"("X"."IDENTITY_NUMBER") FROM "QWZW_ER"."DM_STAFF" "X" 30 WHERE "X"."ID"=:B1)=0)31 7 - filter("A"."SPEC_ID"=3010200004 AND "A"."STATUS_ID"=1000007)32 8 - access("A"."COMPLETE_TIME">=TO_DATE(' 2016-11-18 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 33 "A"."COMPLETE_TIME"<=TO_DATE(' 2016-11-18 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))34 filter(TBL$OR$IDX$PART$NUM("QWZW_ER"."EM_ORDER",0,1,0,ROWID)=8)35 9 - access("A"."ID"="C"."A_ORDER_ID")36 11 - access("C"."B_ORDER_ID"="D"."ID")37 12 - filter("E"."WORK_TYPE_ID"=1001411)38 13 - access("D"."ID"="E"."ORDER_ID")39 14 - access("A"."ID"="B"."T_ID")40 filter("B"."T_ID" IS NOT NULL)41 17 - access("X"."ID"=:B1)
5,优化完之后, 首次执行大概在 4分钟吧, 再次执行只需要 47S 左右。对比之前的执行计划,反正木有发现在47S出结果的。 结果大概3千多条数据。 总结:大表很大表关联,常理思路是走hash, 但是此处场景,驱动表通过过滤之后返回几千条数据,被驱动表走主键唯一扫描,走NL效率更高。