Saturday, 22 June 2013

Use of Non-Deterministic functions in a SQL may give Wrong Results

Here is an interesting little test case to demonstrate the issue.

create table dept (deptno number, dname varchar2(100));

insert into dept values (1,'HR');
insert into dept values (2,'AZ');
insert into dept values (3,'XL');

SELECT deptno, dname
FROM dept
WHERE deptno in (SELECT round(dbms_random.value(1, 3)) FROM dual);

I would expect that the sub query would return 1 row ( 1/2/3) and therefore the main query should always return 1 record.
Now let us run this query a few times and see the result set.

/

2 AZ

/

now rows selected

/

1 HR
3 XL

Here is the execution plan of the statement.

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| DEPT | 4 | 88 | 2 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "SYS"."DUAL" "DUAL" WHERE
ROUND("DBMS_RANDOM"."VALUE"(1,3))=:B1))
3 - filter(ROUND("DBMS_RANDOM"."VALUE"(1,3))=:B1)

Strange, different executions return different number of records. Wrong results ?
No. Problem here is with the non-deterministic function (dbms_random.value).
This function is evaluated once for each row returned by DEPT, rather than
being evaluated once at the start. This behavior is expected as optimizer
does not guarantee when a non-deterministic function will be evaluated during
the statement execution.
You have to be careful when you use such non-deterministic functions in the SQL statement.

Interesting Case where a full table scan is chosen over an index scan - Out of Range Selectivity ?

I was working on a real-life case where CBO was choosing a full table scan over an index scan.Created a small test case to demonstrate the issue.
I created a table test - with two colummns ID and ORG

create table test ( id number, org varchar2(100) );

begin
for i in 1..300000
loop
insert into test values (1,'XXXXXXXXXXXXXX');
end loop;
end;
/

commit;

- Inserted 300000 records for ID=1

begin
for i in 1..100000
loop
insert into test values (17,'YYYYYYYYYYYYYYYYY');
end loop;
end;
/
commit;

- Inserted 100000 records for ID=17
- Therefore ID has two distinct values 1 and 17.

create index idx_1 on test(id);

- Created an index on TEST.ID

- Gathering 100% statistics with histograms.

exec dbms_stats.gather_table_stats(user,'TEST',estimate_percent=>100,method_opt=>'for columns ID size 2');

Since, we have no values with ID=0. And If I run "select * from test where id=0" , I would expect an index range scan.Since there are no records with ID=0, index scan would finish in the blink of an eye regardless of the size of the table.Let us look at the execution statistics.

SQL> set autotrace on
SQL> select * from test where id=0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46875 | 869K| 230 (3)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| TEST | 46875 | 869K| 230 (3)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=0)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1313 consistent gets
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

Strangely, Optimizer chooses a FULL TABLE SCAN over Index IDX_1
Notice, that we are doing 1313 consistent gets here.

Now let us force the index and see the execution statistics.

SQL> select /*+ index(test idx_1) */ * from test where id=0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3519455734

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46875 | 869K| 249 (1)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 46875 | 869K| 249 (1)| 00:00:03 |
|* 2 | INDEX RANGE SCAN | IDX_1 | 46875 | | 95 (2)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=0)

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
339 bytes sent via SQL*Net to client
404 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
0 rows processed

OK, with an index hint, query runs much faster and number of consistent gets comes down from 1313 to just 3 !

Now this is interesting, I have 100% statistics, histograms are in place. Why can't optimizer choose an index by itself ?

In the execution plan, we see that optimizer exepects our query to return 46875 rows. This number is looking suspicious.

Because of this large value, optimizer considers full table scan less expensive than an index scan.

Let us work out how optimizer arrived at this number.

These are the values known to the optimizer (before parsing the statement in question).

count(*) for 1 - 300000
count(*) for 17 - 100000 --> Min(BucketSize)
NumRows => Number of rows in the table which is 400000
low_value=1
high_value=17
Predicate Val=0

Optimizer is using this formula to determine the selectivity for predicate value 0.
This predicate is out of range ( not betweeen min value 1 and max value 17)

Sel = (100000 /2*400000) * ( 1- ((1-0)/(17-1))))

= 0.117

Therefore,

cardinality = Sel * NumRows
= 0.11 * 400000
= 46875

CONCLUSION

When predicate value is closer to the min / max value selectivity is higher. (Like in our case, 0 is close to 1)
And when your predicate value is way beyond min/max value, selectivity is lower.

SOLUTION

As explained, this is an expected behvaior. This is how optimizer determines selectivity of a out of range predicate.
As a workaround

- Use Index Hint
- Adjust optimizer_index_cost_adj to lower value say 50.
- Use plan stability features such as baselines.

Saturday, 19 January 2013

Optimizer is Over Estimating Selectivity with the LIKE clause ?

I came across a problem where Optimizer was over estimating the selectivity for a predicate in the LIKE clause. Here is the set up.

SQL> create table mytest (col2 varchar2(2000));

SQL> begin
for i in 1..100
loop
insert into mytest values ('AB_1' || to_char(i) );
end loop;
end;
/

SQL> commit ;

I created a table mytest. Inserted 100 records in there. This is the pattern of data that I have.

AB_11
AB_12
AB_13
...
AB_1100

Collect statistics with 100% sample and without histograms.

SQL> exec dbms_stats.gather_table_stats(user,'mytest',estimate_percent=>100,method_opt=>'for all columns size 1');

SQL> select num_rows from user_tables where table_name='MYTEST';

NUM_ROWS
----------
100

SQL> select num_distinct from user_tab_col_statistics where table_name='MYTEST'

and column_name='COL2';

NUM_DISTINCT

------------

100

So, these statistics are correct. We have 100 distinct values and 100 number of rows in the table.

Now, consider this query.

SQL> select count(*) from mytest where col2 like 'AB_188%' ;

COUNT(*)
----------
1

I know this was going to return count as 1. Now let us look at the execution plan.

Execution Plan

Id	Operation	Name	Rows	Bytes	Cost (%CPU)	Time
0	SELECT STATEMENT				2 (100)
1	SORT AGGREGATE		1	7
2	TABLE ACCESS FULL	MYTEST	99	693	2 (0)	00:00:0

Estimated Number of Rows - 99. Why did optimizer just over estimated this cardinality ?
I know this was going to return just 1 row. Statistics are 100% accurate. What went wrong here ?

I thought this was something to do with the histograms. I created a frequency histogram with 100 buckets.

SQL> exec dbms_stats.gather_table_stats(user,'mytest',estimate_percent=>100,method_opt=>'for all columns size 100');

SQL> select count(*) from mytest where col2 like 'AB_188%' ;

COUNT(*)
----------
1

Execution Plan

Id	Operation	Name	Rows	Bytes	Cost (%CPU)	Time
0	SELECT STATEMENT				2 (100)
1	SORT AGGREGATE		1	7
2	TABLE ACCESS FULL	MYTEST	100	693	2 (0)	00:00:0

With histograms in place, estimated number of rows is now 100.

So, why is optimizer overestimating the selectivity ? Why is it that optimizer believed that our search string will return 100 rows ignoring the fact that it actually returns 1 row. With this kind of data distribution and if I had million of records, optimizer will always choose a full table scan and not an index range scan.

Here is what is happening . There is a "_" character in our search string . Optimizer interprets it as "match any character". Therefore, selectivity is computed after ignoring everything that comes after this "_" character.
So, selectivity of AB_188% is considered as selectivity of AB%. And this leads to over-estimation.

Now let us run this query. I instruct optimizer to consider "_" as a regular character.

SQL> select count(*) from mytest where col2 like 'AB\_188%' escape '\';

COUNT(*)
----------
1

Execution Plan

Id	Operation	Name	Rows	Bytes	Cost (%CPU)	Time
0	SELECT STATEMENT				2 (100)
1	SORT AGGREGATE		1	7
2	TABLE ACCESS FULL	MYTEST	1	693	2 (0)	00:00:0

Wow ! Now optimizer guesses it correct. Estimated cardinality of 1.

So, you have to be careful when you use a "LIKE" clause.

Histograms statistics not accurate with Auto Sample Size

I was reading that AUTO sampling in 11g generates statistics that are as accurate as 100% sampling but takes less time than 100% sampling. I decided to look at the quality of these statistics. Are they good for the optimizer ?

SQL> create table sales (order_key number) ;

Table created.

Insert few records.

SQL> begin
for i in 1..10000
loop
insert into sales values(10);
insert into sales values(20);
insert into sales values(30);
insert into sales values(40);
insert into sales values(50);
end loop;
end ;
/

I inserted five distinct values (10,20,30,40,50) each a 10,000 time.

Now take statistics with AUTO_SAMPLE_SIZE

SQL> exec dbms_stats.gather_table_stats(user,'SALES',method_opt=>'for all columns size 254');

This command completed in just 30 seconds. WOW !!

Let us look at the quality statistics now.

SQL> select num_rows,sample_size from user_tables where table_name='SALES';

NUM_ROWS SAMPLE_SIZE
---------- -----------
50000 50000

SQL> select num_distinct from user_tab_col_statistics where table_name='SALES';

NUM_DISTINCT

--------------------

It tells me that the sample size used was 100% and we see the correct number of num_rows and num_distinct.

These statistics are 100% accurate. Takes very little time. What is wrong then ?

Now let us look at the sample used to generate column statistics (histograms)

When I use method_opt => 'for all columns size 254' , I instruct Oracle to create a histogram on every column

SQL> select sample_size from user_tab_col_statistics where table_name='SALES'

and column_name='ORDER_KEY' ;

SAMPLE_SIZE

-------------------

5450

OK. So Oracle used different samples to collect different statistics. It looks like NUM_ROWS and NUM_DISTINCT was derived from 100% sample but histogram was computed by sampling only 5450 rows ! I ran the same tests with 10 million rows, column sample size was still close to ~ 5550. Is it something that is hard-coded into Oracle's code ?

We create histograms to make optimizer aware of the real data distribution. Of course, quality of this particular statistics might be very very bad given this small sample size.

So, why present the optimizer with histograms when it is not even close to real ?

How many SQL's are actually captured and displayed by AWR ?

While looking into a performance problem, we generally take an AWR report and directly look at the top resource consuming SQL statements. But what do you do when the most expensive SQL is missing from the AWR report ?

How do I check if my AWR report is complete and has captured all or most of the SQL statements ?

Let us have a look at this section of the AWR report.

WORKLOAD REPOSITORY report for

DB Name	DB Id	Instance	Inst num	Release	RAC	Host
VISHAL-TEST	9999999999	VISHAL-TEST	4	10.2.0.4.0	YES	abcd.host.com

	Snap Id	Snap Time	Sessions	Cursors/Session
Begin Snap:	18305	07-Jan-13 15:00:30	38	1.6
End Snap:	18306	07-Jan-13 16:00:05	38	1.6
Elapsed:		59.59 (mins)
DB Time:		58.40 (mins)

SQL ordered by Executions

Total Executions: 9,914
Captured SQL account for 68.9% of Total

Executions	Rows Processed	Rows per Exec	SQL Id	SQL Module	SQL Text
3,989	42	0.01	c4nhd1ntptxq7		select message_level, sequenc...
715	0	0.00	6ssrk2dqj7jbx		select job, nvl2(last_date, ...
238	238	1.00	gj5r9jj2xad7f	JDBC Thin Client	SELECT 1, status, '', arch...
114	114	1.00	0ws7ahf1d78qa		select SYS_CONTEXT('USERENV', ...
114	114	1.00	459f3z9u4fb3u		select value$ from props$ wher...
104	1,872	18.00	0h6b2sajwb74n		select privilege#, level from ...
104	104	1.00	0k8522rmdzg4k		select privilege# from sysauth...
104	0	0.00	1uzhrfn574t12	sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)	SELECT ATTRIBUTE, SCOPE, NUMER...
104	104	1.00	5qgz1p0cut7mx	sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)	BEGIN DBMS_OUTPUT.DISABLE; END...
104	104	1.00	5ur69atw3vfhj		select decode(failover_method,...
104	0	0.00	c4pc3jhzjcmc7	sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)	SELECT CHAR_VALUE FROM SYSTEM....
104	104	1.00	d6vwqbw6r2ffk	sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)	SELECT USER FROM DUAL
104	104	1.00	dyk4dprp70d74	SQL*Plus	SELECT DECODE('A', 'A', '1', '...
104	104	1.00	g4y6nw3tts7cc	sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)	BEGIN DBMS_APPLICATION_INFO.SE...

It is very important to understand this section of the AWR report.

Well, there are two things to consider.

1. How much data is being captured in the AWR repository ?

2. How much data is being displayed in the AWR report ?

Looking at this AWR report, I see total executions at 9914. (This number comes from DBA_HIST_SYSSTAT ). 68.9 % of this 9914 = 6830 executions were captured in the AWR repository.

What happened to the other SQL statements ? Why did Oracle fail to capture them ?

Could be the case where these missing SQL statements might have consumed a lot of CPU and that information is now lost.

By default AWR captues a snap every 1 hour. It takes a snap of V$SQL every hour and keeps this data safe in DBA_HIST_SQLSTAT and other related tables. Let us consider this scenario

At 10 AM - You take an AWR snap . S1

At 10.15 AM - You run a resource intensive SQL statement

At 10.45 AM - SQL execution got completed. It has done a lot of damage by now.

At 10.50 AM - You flush the shared pool manually. / SQL gets aged out of shared pool automatically.

At 11 AM - You take another AWR snap S2

Now, if you generate an AWR report between these two snaps. You would see that the SQL which you ran is missing ! That's an interesting takeaway. AWR base tables are populated through V$SQL and if there is something missing in V$SQL, AWR too would miss the same.

Coming back to our AWR report,

Total executions were - 9914

6830 executions were actually captured by the AWR repository.

Rest is lost ! ( Well, not completely, we will talk about ASH a little later. ASH takes samples every second and may give us some more information )

Now lets us add up the numbers in the "executions" column of this AWR report.

3989 + 715 + 238 + ... = 6106

AWR told us it captured 6830 executions, but when I add up these numbers I get 6106.

Well, this is quite simple. AWR does not display everything that is present in DBA_HIST_SQLSTAT

You can directly query DBA_HIST_SQLSTAT if you want to get a list of all the SQLs.

From 11.2 it is possible to control the number of SQL statements displayed in the AWR report. It defaults to 30 and can be altered using the following procedure.

SQL> exec dbms_workload_repository.awr_set_report_thresholds(top_n_sql=>50,top_n_sql_max=>50);

Vishal's blog - Oracle Database Performance

Saturday, 22 June 2013

Use of Non-Deterministic functions in a SQL may give Wrong Results

Interesting Case where a full table scan is chosen over an index scan - Out of Range Selectivity ?

Saturday, 19 January 2013

Optimizer is Over Estimating Selectivity with the LIKE clause ?

Execution Plan

Execution Plan

Execution Plan

Histograms statistics not accurate with Auto Sample Size

How many SQL's are actually captured and displayed by AWR ?

WORKLOAD REPOSITORY report for

SQL ordered by Executions

Bloom Partition Pruning In Oracle Vs Postgres

Blogroll

DISCLAIMER