Saturday 22 June 2013

Interesting Case where a full table scan is chosen over an index scan - Out of Range Selectivity ?

I was working on a real-life case where CBO was choosing a full table scan over an index scan.Created a small test case to demonstrate the issue.
I created a table test - with two colummns ID and ORG

create table test ( id number, org varchar2(100) );

begin

for i in 1..300000
loop
insert into test values (1,'XXXXXXXXXXXXXX');
end loop;
end;
/

commit;


- Inserted 300000 records for ID=1


begin
for i in 1..100000
loop
insert into test values (17,'YYYYYYYYYYYYYYYYY');
end loop;
end;
/
commit;

- Inserted 100000 records for ID=17

- Therefore ID has two distinct values 1 and 17.

create index idx_1 on test(id);

- Created an index on TEST.ID


- Gathering 100% statistics with histograms.


exec dbms_stats.gather_table_stats(user,'TEST',estimate_percent=>100,method_opt=>'for columns ID size 2');

Since, we have no values with ID=0. And If I run "select * from test where id=0" , I would expect an index range scan.Since there are no records with ID=0, index scan would finish in the blink of an eye regardless of the size of the table.Let us look at the execution statistics.


SQL> set autotrace on
SQL> select * from test where id=0;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 46875 |   869K|   230   (3)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| TEST | 46875 |   869K|   230   (3)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("ID"=0)


Statistics

----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1313  consistent gets
          0  physical reads
          0  redo size
        339  bytes sent via SQL*Net to client
        404  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Strangely, Optimizer chooses a FULL TABLE SCAN over Index IDX_1

Notice, that we are doing 1313 consistent gets here.

Now let us force the index and see the execution statistics.


SQL> select /*+ index(test idx_1) */ * from test where id=0;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 3519455734

-------------------------------------------------------------------------------------

| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 46875 |   869K|   249   (1)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST  | 46875 |   869K|   249   (1)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN          | IDX_1 | 46875 |       |    95   (2)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID"=0)



Statistics

----------------------------------------------------------
          8  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        339  bytes sent via SQL*Net to client
        404  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          0  rows processed

OK, with an index hint, query runs much faster and number of consistent gets comes down from 1313 to just 3 !


Now this is interesting, I have 100% statistics, histograms are in place. Why can't optimizer choose an index by itself ?


In the execution plan, we see that optimizer exepects our query to return 46875 rows. This number is looking suspicious.


Because of this large value, optimizer considers full table scan less expensive than an index scan.


Let us work out how optimizer arrived at this number.


These are the values known to the optimizer (before parsing the statement in question).


count(*) for 1 -  300000

count(*) for 17 - 100000   --> Min(BucketSize)
NumRows => Number of rows in the table which is 400000
low_value=1
high_value=17
Predicate Val=0

Optimizer is using this formula to determine the selectivity for predicate value 0.

This predicate is out of range ( not betweeen min value 1 and max value 17)

Sel = (100000 /2*400000) * ( 1- ((1-0)/(17-1))))


    = 0.117


Therefore,


cardinality = Sel * NumRows

            = 0.11 * 400000
= 46875

CONCLUSION


When predicate value is closer to the min / max value selectivity is higher. (Like in our case, 0 is close to 1)

And when your predicate value is way beyond min/max value, selectivity is lower.

SOLUTION


As explained, this is an expected behvaior. This is how optimizer determines selectivity of a out of range predicate.

As a workaround

- Use Index Hint

- Adjust optimizer_index_cost_adj to lower value say 50.
- Use plan stability features such as baselines.

No comments:

Post a Comment

19c Multitenant RAC- Default service disables parallelism

We recently upgraded one of our RAC databases from 11.2 to 19.10 on Exadata. And to everyone's surprise, almost everything on the system...