I was working on a real-life case where CBO was choosing a full table scan over an index scan.Created a small test case to demonstrate the issue.
I created a table test - with two colummns ID and ORG
create table test ( id number, org varchar2(100) );
begin
for i in 1..300000
loop
insert into test values (1,'XXXXXXXXXXXXXX');
end loop;
end;
/
commit;
- Inserted 300000 records for ID=1
begin
for i in 1..100000
loop
insert into test values (17,'YYYYYYYYYYYYYYYYY');
end loop;
end;
/
commit;
- Inserted 100000 records for ID=17
- Therefore ID has two distinct values 1 and 17.
create index idx_1 on test(id);
- Created an index on TEST.ID
- Gathering 100% statistics with histograms.
exec dbms_stats.gather_table_stats(user,'TEST',estimate_percent=>100,method_opt=>'for columns ID size 2');
Since, we have no values with ID=0. And If I run "select * from test where id=0" , I would expect an index range scan.Since there are no records with ID=0, index scan would finish in the blink of an eye regardless of the size of the table.Let us look at the execution statistics.
SQL> set autotrace on
SQL> select * from test where id=0;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46875 | 869K| 230 (3)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| TEST | 46875 | 869K| 230 (3)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=0)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1313 consistent gets
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
Strangely, Optimizer chooses a FULL TABLE SCAN over Index IDX_1
Notice, that we are doing 1313 consistent gets here.
Now let us force the index and see the execution statistics.
SQL> select /*+ index(test idx_1) */ * from test where id=0;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3519455734
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46875 | 869K| 249 (1)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 46875 | 869K| 249 (1)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | IDX_1 | 46875 | | 95 (2)| 00:00:02 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=0)
Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
0 rows processed
OK, with an index hint, query runs much faster and number of consistent gets comes down from 1313 to just 3 !
Now this is interesting, I have 100% statistics, histograms are in place. Why can't optimizer choose an index by itself ?
In the execution plan, we see that optimizer exepects our query to return 46875 rows. This number is looking suspicious.
Because of this large value, optimizer considers full table scan less expensive than an index scan.
Let us work out how optimizer arrived at this number.
These are the values known to the optimizer (before parsing the statement in question).
count(*) for 1 - 300000
count(*) for 17 - 100000 --> Min(BucketSize)
NumRows => Number of rows in the table which is 400000
low_value=1
high_value=17
Predicate Val=0
Optimizer is using this formula to determine the selectivity for predicate value 0.
This predicate is out of range ( not betweeen min value 1 and max value 17)
Sel = (100000 /2*400000) * ( 1- ((1-0)/(17-1))))
= 0.117
Therefore,
cardinality = Sel * NumRows
= 0.11 * 400000
= 46875
CONCLUSION
When predicate value is closer to the min / max value selectivity is higher. (Like in our case, 0 is close to 1)
And when your predicate value is way beyond min/max value, selectivity is lower.
SOLUTION
As explained, this is an expected behvaior. This is how optimizer determines selectivity of a out of range predicate.
As a workaround
- Use Index Hint
- Adjust optimizer_index_cost_adj to lower value say 50.
- Use plan stability features such as baselines.
I created a table test - with two colummns ID and ORG
create table test ( id number, org varchar2(100) );
begin
for i in 1..300000
loop
insert into test values (1,'XXXXXXXXXXXXXX');
end loop;
end;
/
commit;
- Inserted 300000 records for ID=1
begin
for i in 1..100000
loop
insert into test values (17,'YYYYYYYYYYYYYYYYY');
end loop;
end;
/
commit;
- Inserted 100000 records for ID=17
- Therefore ID has two distinct values 1 and 17.
create index idx_1 on test(id);
- Created an index on TEST.ID
- Gathering 100% statistics with histograms.
exec dbms_stats.gather_table_stats(user,'TEST',estimate_percent=>100,method_opt=>'for columns ID size 2');
Since, we have no values with ID=0. And If I run "select * from test where id=0" , I would expect an index range scan.Since there are no records with ID=0, index scan would finish in the blink of an eye regardless of the size of the table.Let us look at the execution statistics.
SQL> set autotrace on
SQL> select * from test where id=0;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46875 | 869K| 230 (3)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| TEST | 46875 | 869K| 230 (3)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=0)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1313 consistent gets
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
Strangely, Optimizer chooses a FULL TABLE SCAN over Index IDX_1
Notice, that we are doing 1313 consistent gets here.
Now let us force the index and see the execution statistics.
SQL> select /*+ index(test idx_1) */ * from test where id=0;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3519455734
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46875 | 869K| 249 (1)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 46875 | 869K| 249 (1)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | IDX_1 | 46875 | | 95 (2)| 00:00:02 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=0)
Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
0 rows processed
OK, with an index hint, query runs much faster and number of consistent gets comes down from 1313 to just 3 !
Now this is interesting, I have 100% statistics, histograms are in place. Why can't optimizer choose an index by itself ?
In the execution plan, we see that optimizer exepects our query to return 46875 rows. This number is looking suspicious.
Because of this large value, optimizer considers full table scan less expensive than an index scan.
Let us work out how optimizer arrived at this number.
These are the values known to the optimizer (before parsing the statement in question).
count(*) for 1 - 300000
count(*) for 17 - 100000 --> Min(BucketSize)
NumRows => Number of rows in the table which is 400000
low_value=1
high_value=17
Predicate Val=0
Optimizer is using this formula to determine the selectivity for predicate value 0.
This predicate is out of range ( not betweeen min value 1 and max value 17)
Sel = (100000 /2*400000) * ( 1- ((1-0)/(17-1))))
= 0.117
Therefore,
cardinality = Sel * NumRows
= 0.11 * 400000
= 46875
CONCLUSION
When predicate value is closer to the min / max value selectivity is higher. (Like in our case, 0 is close to 1)
And when your predicate value is way beyond min/max value, selectivity is lower.
SOLUTION
As explained, this is an expected behvaior. This is how optimizer determines selectivity of a out of range predicate.
As a workaround
- Use Index Hint
- Adjust optimizer_index_cost_adj to lower value say 50.
- Use plan stability features such as baselines.
No comments:
Post a Comment