Saturday, 19 January 2013

Optimizer is Over Estimating Selectivity with the LIKE clause ?

I came across a problem where Optimizer was over estimating the selectivity for a predicate in the LIKE clause. Here is the set up.

SQL> create table mytest (col2 varchar2(2000));

SQL> begin
          for i in 1..100
          loop
          insert into mytest values ('AB_1' || to_char(i) );
          end loop;
          end;
          /

SQL> commit ;

I created a table mytest. Inserted 100 records in there. This is the pattern of data that I have.

AB_11
AB_12
AB_13
...
AB_1100

Collect statistics with 100% sample and without histograms.

SQL> exec dbms_stats.gather_table_stats(user,'mytest',estimate_percent=>100,method_opt=>'for all                 columns size 1');

SQL> select num_rows from user_tables where table_name='MYTEST';

  NUM_ROWS

----------
       100

SQL> select num_distinct from user_tab_col_statistics where table_name='MYTEST'
    and column_name='COL2';

NUM_DISTINCT
------------
         100

So, these statistics are correct. We have 100 distinct values and 100 number of rows in the table.
Now, consider this query.

SQL> select count(*) from mytest where col2 like 'AB_188%' ;


COUNT(*)
----------
         1

I know this was going to return count as 1. Now let us look at the execution plan.

Execution Plan

IdOperationNameRowsBytesCost (%CPU)Time
0SELECT STATEMENT2 (100)
1   SORT AGGREGATE17
2     TABLE ACCESS FULLMYTEST996932 (0)00:00:0


Estimated Number of Rows - 99. Why did optimizer just over estimated this cardinality ?
I know this was going to return just 1 row. Statistics are 100% accurate. What went wrong here ?

I thought this was something to do with the histograms. I created a frequency histogram with 100 buckets.

SQL> exec dbms_stats.gather_table_stats(user,'mytest',estimate_percent=>100,method_opt=>'for all                 columns size 100');

SQL> select count(*) from mytest where col2 like 'AB_188%' ;



COUNT(*)
----------
         1

Execution Plan

IdOperationNameRowsBytesCost (%CPU)Time
0SELECT STATEMENT2 (100)
1   SORT AGGREGATE17
2     TABLE ACCESS FULLMYTEST1006932 (0)00:00:0


With histograms in place, estimated number of rows is now 100.

So, why is optimizer overestimating the selectivity ? Why is it that optimizer believed that our search string will return 100 rows ignoring the fact that it actually returns 1 row. With this kind of data distribution and if I had million of records, optimizer will always choose a full table scan and not an index range scan.

Here is what is happening . There is a "_" character in our search string . Optimizer interprets it as "match any character". Therefore, selectivity is computed after ignoring everything that comes after this "_" character.
So, selectivity of AB_188% is considered as selectivity of AB%. And this leads to over-estimation.

Now let us run this query. I instruct optimizer to consider "_" as a regular character.

SQL> select count(*) from mytest where col2 like 'AB\_188%' escape '\';


COUNT(*)
----------
         1

Execution Plan

IdOperationNameRowsBytesCost (%CPU)Time
0SELECT STATEMENT2 (100)
1   SORT AGGREGATE17
2     TABLE ACCESS FULLMYTEST16932 (0)00:00:0


Wow ! Now optimizer guesses it correct. Estimated cardinality of 1.

So, you have to be careful when you use a "LIKE" clause.

Histograms statistics not accurate with Auto Sample Size

I was reading that AUTO sampling in 11g generates statistics that are as accurate as 100% sampling but takes less time than 100% sampling. I decided to look at the quality of these statistics. Are they good for the optimizer ?

SQL> create table sales (order_key number) ;

Table created.


Insert few records.

SQL> begin
     for i in 1..10000
     loop
     insert into sales values(10);
     insert into sales values(20);
     insert into sales values(30);
     insert into sales values(40);
     insert into sales values(50);
     end loop;
     end ;
      /

I inserted five distinct values (10,20,30,40,50) each a 10,000 time.

Now take statistics with AUTO_SAMPLE_SIZE

SQL> exec dbms_stats.gather_table_stats(user,'SALES',method_opt=>'for all columns size 254');

This command completed in just 30 seconds. WOW !!

Let us look at the quality statistics now.

SQL> select num_rows,sample_size from user_tables where table_name='SALES';
   
  NUM_ROWS SAMPLE_SIZE
  ----------  -----------
   50000       50000

SQL> select num_distinct from user_tab_col_statistics where table_name='SALES';

  NUM_DISTINCT
  --------------------
        5

It tells me that the sample size used was 100% and we see the correct number of num_rows and num_distinct. 

These statistics are 100% accurate. Takes very little time. What is wrong then ? 

Now let us look at the sample used to generate column statistics (histograms)

When I use method_opt => 'for all columns size 254' , I instruct Oracle to create a histogram on every column

SQL> select sample_size from user_tab_col_statistics where table_name='SALES'
           and column_name='ORDER_KEY' ;

     SAMPLE_SIZE
     -------------------
        5450

OK. So Oracle used different samples to collect different statistics. It looks like NUM_ROWS and NUM_DISTINCT was derived from 100% sample but histogram was computed by sampling only 5450 rows ! I ran the same tests with 10 million rows, column sample size was still close to ~ 5550. Is it something that is hard-coded into Oracle's code ?

We create histograms to make optimizer aware of the real data distribution. Of course, quality of this particular statistics might be very very bad given this small sample size.

So, why present the optimizer with histograms when it is not even close to real ? 

How many SQL's are actually captured and displayed by AWR ?

While looking into a performance problem, we generally take an AWR report and directly look at the top resource consuming SQL statements. But what do you do when the most expensive SQL is missing from the AWR report ?
How do I check if my AWR report is complete and has captured all or most of the SQL statements ?

Let us have a look at this section of the AWR report.

WORKLOAD REPOSITORY report for

DB NameDB IdInstanceInst numReleaseRACHost
VISHAL-TEST9999999999VISHAL-TEST410.2.0.4.0YESabcd.host.com
Snap IdSnap TimeSessionsCursors/Session
Begin Snap:1830507-Jan-13 15:00:30381.6
End Snap:1830607-Jan-13 16:00:05381.6
Elapsed:59.59 (mins)
DB Time:58.40 (mins)

SQL ordered by Executions

  • Total Executions: 9,914
  • Captured SQL account for 68.9% of Total
ExecutionsRows ProcessedRows per ExecCPU per Exec (s)Elap per Exec (s)SQL IdSQL ModuleSQL Text
3,989420.010.000.00c4nhd1ntptxq7select message_level, sequenc...
71500.000.000.006ssrk2dqj7jbxselect job, nvl2(last_date, ...
2382381.000.000.00gj5r9jj2xad7fJDBC Thin ClientSELECT 1, status, '', arch...
1141141.000.000.000ws7ahf1d78qaselect SYS_CONTEXT('USERENV', ...
1141141.000.000.00459f3z9u4fb3uselect value$ from props$ wher...
1041,87218.000.000.000h6b2sajwb74nselect privilege#, level from ...
1041041.000.000.000k8522rmdzg4kselect privilege# from sysauth...
10400.000.000.001uzhrfn574t12sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)SELECT ATTRIBUTE, SCOPE, NUMER...
1041041.000.000.005qgz1p0cut7mxsqlplus@vcd01024.ute.fedex.com (TNS V1-V3)BEGIN DBMS_OUTPUT.DISABLE; END...
1041041.000.000.005ur69atw3vfhjselect decode(failover_method,...
10400.000.000.00c4pc3jhzjcmc7sqlplus@vcd01024.ute.fedex.com (TNS V1-V3)SELECT CHAR_VALUE FROM SYSTEM....
1041041.000.000.00d6vwqbw6r2ffksqlplus@vcd01024.ute.fedex.com (TNS V1-V3)SELECT USER FROM DUAL
1041041.000.000.00dyk4dprp70d74SQL*PlusSELECT DECODE('A', 'A', '1', '...
1041041.000.000.00g4y6nw3tts7ccsqlplus@vcd01024.ute.fedex.com (TNS V1-V3)BEGIN DBMS_APPLICATION_INFO.SE...

It is very important to understand this section of the AWR report. 

Well, there are two things to consider.

1. How much data is being captured in the AWR repository ?
2. How much data is being displayed in the AWR report ?

Looking at this AWR report, I see total executions at 9914. (This number comes from DBA_HIST_SYSSTAT ). 68.9 % of this 9914 = 6830 executions were captured in the AWR repository.

What happened to the other SQL statements ? Why did Oracle fail to capture them ?
Could be the case where these missing SQL statements might have consumed a lot of CPU and that information is now lost.

By default AWR captues a snap every 1 hour. It takes a snap of V$SQL every hour and keeps this data safe in DBA_HIST_SQLSTAT and other related tables. Let us consider this scenario

At 10 AM - You take an AWR snap . S1
At 10.15 AM - You run a resource intensive SQL statement
At 10.45 AM -  SQL execution got completed. It has done a lot of damage by now.
At 10.50 AM - You flush the shared pool manually. / SQL gets aged out of shared pool automatically.
At 11 AM - You take another AWR snap S2

Now, if you generate an AWR report between these two snaps. You would see that the SQL which you ran is missing ! That's an interesting takeaway. AWR base tables are populated through V$SQL and if there is something missing in V$SQL, AWR too would miss the same.

Coming back to our AWR report, 
Total executions were - 9914
6830 executions were actually captured by the AWR repository.

Rest is lost ! ( Well, not completely, we will talk about ASH a little later. ASH takes samples every  second and may give us some more information )

Now lets us add up the numbers in the "executions" column of this AWR report. 

3989 + 715 + 238 + ... = 6106

AWR told us it captured 6830 executions, but when I add up these numbers I get 6106.
Well, this is quite simple. AWR does not display everything that is present in DBA_HIST_SQLSTAT
You can directly query DBA_HIST_SQLSTAT if you want to get a list of all the SQLs.

From 11.2 it is possible to control the number of SQL statements displayed in the AWR report. It defaults to 30 and can be altered using the following procedure.

SQL> exec dbms_workload_repository.awr_set_report_thresholds(top_n_sql=>50,top_n_sql_max=>50);





19c Multitenant RAC- Default service disables parallelism

We recently upgraded one of our RAC databases from 11.2 to 19.10 on Exadata. And to everyone's surprise, almost everything on the system...