Saturday, 22 June 2013

Use of Non-Deterministic functions in a SQL may give Wrong Results

Here is an interesting little test case to demonstrate the issue.

create table dept (deptno number, dname varchar2(100));

insert into dept values (1,'HR');

insert into dept values (2,'AZ');
insert into dept values (3,'XL');

SELECT deptno, dname

FROM dept
WHERE deptno in (SELECT round(dbms_random.value(1, 3)) FROM dual);

I would expect that the sub query would return 1 row ( 1/2/3) and therefore the main query should always return 1 record.
Now let us run this query a few times and see the result set.

/

2 AZ


/


now rows selected


/


1 HR

3 XL

Here is the execution plan of the statement.

 ---------------------------------------------------------------------------
 | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
 ---------------------------------------------------------------------------
 |   0 | SELECT STATEMENT   |      |     1 |    22 |     4   (0)| 00:00:01 |
 |*  1 |  FILTER            |      |       |       |            |          |
 |   2 |   TABLE ACCESS FULL| DEPT |     4 |    88 |     2   (0)| 00:00:01 |
 |*  3 |   FILTER           |      |       |       |            |          |
 |   4 |    FAST DUAL       |      |     1 |       |     2   (0)| 00:00:01 |
 ---------------------------------------------------------------------------
  
 Predicate Information (identified by operation id):
 ---------------------------------------------------
    1 - filter( EXISTS (SELECT 0 FROM "SYS"."DUAL" "DUAL" WHERE
               ROUND("DBMS_RANDOM"."VALUE"(1,3))=:B1))
    3 - filter(ROUND("DBMS_RANDOM"."VALUE"(1,3))=:B1)
  
Strange, different executions return different number of records. Wrong results ?
No. Problem here is with the non-deterministic function (dbms_random.value).
This function is evaluated once for each row returned by DEPT, rather than 
being evaluated once at the start. This behavior is expected as optimizer
does not guarantee when a non-deterministic function will be evaluated during 
the statement execution.
You have to be careful when you use such non-deterministic functions in the SQL statement.

Interesting Case where a full table scan is chosen over an index scan - Out of Range Selectivity ?

I was working on a real-life case where CBO was choosing a full table scan over an index scan.Created a small test case to demonstrate the issue.
I created a table test - with two colummns ID and ORG

create table test ( id number, org varchar2(100) );

begin

for i in 1..300000
loop
insert into test values (1,'XXXXXXXXXXXXXX');
end loop;
end;
/

commit;


- Inserted 300000 records for ID=1


begin
for i in 1..100000
loop
insert into test values (17,'YYYYYYYYYYYYYYYYY');
end loop;
end;
/
commit;

- Inserted 100000 records for ID=17

- Therefore ID has two distinct values 1 and 17.

create index idx_1 on test(id);

- Created an index on TEST.ID


- Gathering 100% statistics with histograms.


exec dbms_stats.gather_table_stats(user,'TEST',estimate_percent=>100,method_opt=>'for columns ID size 2');

Since, we have no values with ID=0. And If I run "select * from test where id=0" , I would expect an index range scan.Since there are no records with ID=0, index scan would finish in the blink of an eye regardless of the size of the table.Let us look at the execution statistics.


SQL> set autotrace on
SQL> select * from test where id=0;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 46875 |   869K|   230   (3)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| TEST | 46875 |   869K|   230   (3)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("ID"=0)


Statistics

----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1313  consistent gets
          0  physical reads
          0  redo size
        339  bytes sent via SQL*Net to client
        404  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Strangely, Optimizer chooses a FULL TABLE SCAN over Index IDX_1

Notice, that we are doing 1313 consistent gets here.

Now let us force the index and see the execution statistics.


SQL> select /*+ index(test idx_1) */ * from test where id=0;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 3519455734

-------------------------------------------------------------------------------------

| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 46875 |   869K|   249   (1)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST  | 46875 |   869K|   249   (1)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN          | IDX_1 | 46875 |       |    95   (2)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID"=0)



Statistics

----------------------------------------------------------
          8  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        339  bytes sent via SQL*Net to client
        404  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          0  rows processed

OK, with an index hint, query runs much faster and number of consistent gets comes down from 1313 to just 3 !


Now this is interesting, I have 100% statistics, histograms are in place. Why can't optimizer choose an index by itself ?


In the execution plan, we see that optimizer exepects our query to return 46875 rows. This number is looking suspicious.


Because of this large value, optimizer considers full table scan less expensive than an index scan.


Let us work out how optimizer arrived at this number.


These are the values known to the optimizer (before parsing the statement in question).


count(*) for 1 -  300000

count(*) for 17 - 100000   --> Min(BucketSize)
NumRows => Number of rows in the table which is 400000
low_value=1
high_value=17
Predicate Val=0

Optimizer is using this formula to determine the selectivity for predicate value 0.

This predicate is out of range ( not betweeen min value 1 and max value 17)

Sel = (100000 /2*400000) * ( 1- ((1-0)/(17-1))))


    = 0.117


Therefore,


cardinality = Sel * NumRows

            = 0.11 * 400000
= 46875

CONCLUSION


When predicate value is closer to the min / max value selectivity is higher. (Like in our case, 0 is close to 1)

And when your predicate value is way beyond min/max value, selectivity is lower.

SOLUTION


As explained, this is an expected behvaior. This is how optimizer determines selectivity of a out of range predicate.

As a workaround

- Use Index Hint

- Adjust optimizer_index_cost_adj to lower value say 50.
- Use plan stability features such as baselines.

19c Multitenant RAC- Default service disables parallelism

We recently upgraded one of our RAC databases from 11.2 to 19.10 on Exadata. And to everyone's surprise, almost everything on the system...