Tuesday, 28 April 2020

Various methods of Partition Pruning and DML restrictions

Partition Pruning is an optimization technique that the optimizer applies to eliminate certain partitions during query execution. In this article we will discuss about various methods of Partition Pruning and certain pruning restrictions that apply to Insert As Select ( IAS) statements.

Let's get started with a test case. We create two tables T1 and T2. T1 is non partitioned.
T2 is a partitioned table. Partitioned by HASH on the partition_id column.

CREATE TABLE T1 ( event_name VARCHAR2(254) , partition_id NUMBER);

CREATE TABLE T2
(
    partition_id      NUMBER,
    partition_name    VARCHAR2 (254)
)
PARTITION BY HASH (partition_id)(PARTITION p1,
 PARTITION p2,
 PARTITION p3,
 PARTITION p4);

Now we run a SELECT statement with different hints and examine the execution plans and look at the different types of partition pruning Oracle uses.

BLOOM PRUNING


EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where 
T1.event_name like 'str%'
and T1.partition_id=T2.partition_id

Plan hash value: 2384829105

--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |  5000 |   976K| 11158   (1)| 00:03:33 |       |       |
|*  1 |  HASH JOIN                   |         |  5000 |   976K| 11158   (1)| 00:03:33 |       |       |
|   2 |   JOIN FILTER CREATE         | :BF0001 |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   3 |    PART JOIN FILTER CREATE   | :BF0000 |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|*  4 |     TABLE ACCESS STORAGE FULL| T1      |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   5 |   JOIN FILTER USE            | :BF0001 |   100K|  9765K| 11154   (1)| 00:03:32 |       |       |
|   6 |    PARTITION HASH JOIN-FILTER|         |   100K|  9765K| 11154   (1)| 00:03:32 |:BF0000|:BF0000|
|*  7 |     TABLE ACCESS STORAGE FULL| T2      |   100K|  9765K| 11154   (1)| 00:03:32 |:BF0000|:BF0000|
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")
   4 - storage("T1"."EVENT_NAME" LIKE 'str%')
       filter("T1"."EVENT_NAME" LIKE 'str%')
   7 - storage(SYS_OP_BLOOM_FILTER(:BF0001,"T2"."PARTITION_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0001,"T2"."PARTITION_ID"))

Here a bloom filter is created at O/P 3. And at O/P 7, Pstart and Pstop is :BF0000.
Based on this bloom filter partition pruning occurs on T2 and therefore, only the partitions containing the relevant data are scanned.This is called Bloom Pruning.
This is controlled by the hidden parameter _bloom_pruning_enabled which defaults to TRUE
At O/P 2 we see JOIN FILTER CREATE, this is a join filter which also relies on bloom filters but this does not eliminate partitions. This is another sort of optimization and not to be confused with bloom pruning. You can read about the bloom filter here.


KEY KEY PRUNING


EXPLAIN PLAN FOR
SELECT /*+ USE_NL(T1) */ * FROM T1,T2 where 
T1.event_name like 'str%'
and T1.partition_id=T2.partition_id

Plan hash value: 2185522821

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |  5000 |   976K|  2792   (1)| 00:00:54 |       |       |
|   1 |  NESTED LOOPS               |      |  5000 |   976K|  2792   (1)| 00:00:54 |       |       |
|*  2 |   TABLE ACCESS STORAGE FULL | T1   |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   3 |   PARTITION HASH ITERATOR   |      | 10000 |   976K|  2788   (1)| 00:00:53 |   KEY |   KEY |
|*  4 |    TABLE ACCESS STORAGE FULL| T2   | 10000 |   976K|  2788   (1)| 00:00:53 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("T1"."EVENT_NAME" LIKE 'str%')
       filter("T1"."EVENT_NAME" LIKE 'str%')
   4 - storage("T1"."PARTITION_ID"="T2"."PARTITION_ID")
       filter("T1"."PARTITION_ID"="T2"."PARTITION_ID")

With Nested Loop joins, the execution plan has completely changed. There are no bloom filters here.
All rows of T1 are scanned. And for each of these partition_id's Oracle can now prune into a specific partition on T2. Pstart and Pstop shows as KEY-KEY. If T1 returns 10 records with same partition ids - then the same partition on T2 is scanned 10 times. This kind of pruning is bad if T1 returns lot of rows.

SUBQUERY PRUNING


ALTER SESSION SET "_bloom_pruning_enabled"=FALSE
ALTER SESSION SET "_bloom_filter_enabled"=FALSE

EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(T2) SUBQUERY_PRUNING(T2 PARTITION) */ * FROM T1,T2 where 
T1.event_name like 'str%'
and T1.partition_id=T2.partition_id

Plan hash value: 2405306729

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |  5000 |   976K| 11158   (1)| 00:03:33 |       |       |
|*  1 |  HASH JOIN                  |      |  5000 |   976K| 11158   (1)| 00:03:33 |       |       |
|*  2 |   TABLE ACCESS STORAGE FULL | T1   |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   3 |   PARTITION HASH SUBQUERY   |      |   100K|  9765K| 11154   (1)| 00:03:32 |KEY(SQ)|KEY(SQ)|
|   4 |    TABLE ACCESS STORAGE FULL| T2   |   100K|  9765K| 11154   (1)| 00:03:32 |KEY(SQ)|KEY(SQ)|
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")
   2 - storage("T1"."EVENT_NAME" LIKE 'str%')
       filter("T1"."EVENT_NAME" LIKE 'str%')

We have disabled bloom pruning here. With Hash joins, Oracle now goes for Subquery Pruning. At O/P 3 you will see Partition Hash Subquery. T1 is fully scanned. Now Oracle runs a recursive query on TBL$OR$IDX$PART$NUM to discover what partitions it will need from T2. Based on the result of that recursive query, T2 is pruned. Pstart and Pstop is KEY(SQ) KEY( SQ) 
This is controlled by the parameter _subquery_pruning_enabled.

INSERT AS SELECT and BLOOM PRUNING


EXPLAIN PLAN FOR
INSERT /*+ APPEND */  INTO T3
SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where 
T1.event_name like 'str%'
and T1.partition_id=T2.partition_id

Plan hash value: 1934473477
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT             |      |  5000 |   976K| 11158   (1)| 00:03:33 |       |       |
|   1 |  LOAD AS SELECT              | T3   |       |       |            |          |       |       |
|*  2 |   HASH JOIN                  |      |  5000 |   976K| 11158   (1)| 00:03:33 |       |       |
|*  3 |    TABLE ACCESS STORAGE FULL | T1   |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   4 |    PARTITION HASH ALL        |      |   100K|  9765K| 11154   (1)| 00:03:32 |     1 |     4 |
|   5 |     TABLE ACCESS STORAGE FULL| T2   |   100K|  9765K| 11154   (1)| 00:03:32 |     1 |     4 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")
   3 - storage("T1"."EVENT_NAME" LIKE 'str%')
       filter("T1"."EVENT_NAME" LIKE 'str%')

In 11.2.0.4 Oracle does not consider bloom pruning for Insert As Select statements. 
Looking at the plan Pstart -> Pstop is 1 and 4. All partitions of T2 are being scanned. This could cause major performance issues when you have tens of thousands of partitions in T2.
This restriction has been lifted on 18.1. One workaround would be to use CTAS instead if IAS.
CTAS allows bloom pruning. 

You can now request backport for fix 16609749 on 11.2.0.4. This enables bloom pruning for direct path insert as selects. Note that this fix is not enabled by default. You need to explicitly enable this 

ALTER SESSION SET "_fix_control" = '16609749:ON';

EXPLAIN PLAN FOR
INSERT /*+ APPEND */  INTO T3
SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where 
T1.event_name like 'str%'
and T1.partition_id=T2.partition_id


Plan hash value: 3019913043
---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT              |         |  5000 |   976K| 12379   (1)| 00:03:31 |       |       |
|   1 |  LOAD AS SELECT               | T3      |       |       |            |          |       |       |
|*  2 |   HASH JOIN                   |         |  5000 |   976K| 12379   (1)| 00:03:31 |       |       |
|   3 |    JOIN FILTER CREATE         | :BF0001 |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   4 |     PART JOIN FILTER CREATE   | :BF0000 |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS STORAGE FULL| T1      |     1 |   100 |     4   (0)| 00:00:01 |       |       |
|   6 |    JOIN FILTER USE            | :BF0001 |   100K|  9765K| 12375   (1)| 00:03:31 |       |       |
|   7 |     PARTITION HASH JOIN-FILTER|         |   100K|  9765K| 12375   (1)| 00:03:31 |:BF0000|:BF0000|
|*  8 |      TABLE ACCESS STORAGE FULL| T2      |   100K|  9765K| 12375   (1)| 00:03:31 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")
   5 - storage("T1"."EVENT_NAME" LIKE 'str%')
       filter("T1"."EVENT_NAME" LIKE 'str%')
   8 - storage(SYS_OP_BLOOM_FILTER(:BF0001,"T2"."PARTITION_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0001,"T2"."PARTITION_ID"))

And BLOOM ....!

Friday, 24 April 2020

System Statistics on Exadata and its impact on Optimizer cost

In this post I will discuss about the Exadata mode system statistics and how this impacts the cost of a full table scan.
Let's get started with the test case. I'm running this test case on an Exadata machine with database version 18.1

SQL> create table T (ID number) segment creation immediate ;

I don't want to insert a lot of rows, so I just fudge the statistics.

SQL> exec dbms_stats.set_table_stats(user,'T',numrows=>50000,numblks=>50000);

Optimizer now believes that this table has 50000 rows and 50000 blocks.
I currently  have NOWORKLOAD statistics 

FLAGS 0
CPUSPEEDNW 3326
IOSEEKTIM 10
IOTFRSPEED 4096
SREADTIM
MREADTIM
CPUSPEED
MBRC

FLAGS set to 0 indicate No Workload statistics. Note that SREADTIM and MREADTIM is not set. They are derived from IO Seek time and IO transfer speed. MBRC is also not set, it is derived from the parameter db_file_multiblock_read_count. Note that this parameter has been explicitly set to 64 on my database ( Why did I use the word "explicitly" - will explain that in some other post).
Now let's see the plan

SQL> explain plan for 
                                select * from T;

Plan hash value: 1601196873
----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      | 50000 |   634K| 14857   (1)| 00:00:02 |
|   1 |  TABLE ACCESS STORAGE FULL| T    | 50000 |   634K| 14857   (1)| 00:00:02 |
----------------------------------------------------------------------------------

Cost of this full table scan is 14857. 
We will see how Oracle computes the cost

Cost = IO Cost + CPU Cost

IO Cost of FTS = (Blocks/MBRC) * (MREADTIME/SREADTIM)

Blocks = 50000
MBRC = 64
SREADTIM = IOSEEKTIME + DB_BLOCK_SIZE/IOTFRSPEED
                     = 10 + 16384/4096
                     = 14 
MREADTIM =IOSEEKTIME + (MBRC*DB_BLOCK_SIZE) / IOTFRSPEED
                      = 10 + (64*16384)/4096
                      = 266
Cost of FTS = (50000/64) * ( 266/14)
                    = 14843 

14843 is the IO cost. Very close to 14857 which we see in the execution plan. The difference is the CPU cost which is very minimal here.

Now let's gather the statistics in Exadata Mode and compute the numbers again

SQL> exec dbms_stats.gather_system_stats('EXADATA')

These are my new values now

FLAGS
1
CPUSPEEDNW 3326
IOSEEKTIM 13
IOTFRSPEED 204800
SREADTIM
MREADTIM
CPUSPEED
MBRC 64

FLAGS set to 1 indicate workload statistics. Now let's compute the cost of FTS

Blocks = 50000
MBRC = 64
SREADTIM = IOSEEKTIME + DB_BLOCK_SIZE/IOTFRSPEED
                     = 13 + 16384/204800
                     = 13.08
MREADTIM =IOSEEKTIME + (MBRC*DB_BLOCK_SIZE) / IOTFRSPEED
                      = 13 + (64*16384)/204800
                      = 18.12
Cost of FTS = (50000/64) * ( 18.12/13.08)
                    = 1082

Let's look at the plan to see if our calculations are correct

SQL> explain plan for 
                                select * from T;

Plan hash value: 1601196873
----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      | 50000 |   634K|  1095   (2)| 00:00:01 |
|   1 |  TABLE ACCESS STORAGE FULL| T    | 50000 |   634K|  1095   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Yes. Our calculations look correct. Plan shows 1095 and we calculated the value to be 1082. Difference is the CPU cost.

FTS cost dropped from 14843 to just 1082 .Fact that Exadata is a powerful hardware and that has to be accounted in the cost. Full table scans are much faster on Exadata and therefore the cost of FTS should be much lesser. This is why you must gather system statistics in Exadata mode.







Thursday, 16 April 2020

Oracle is running truncates against your tabes but don't worry


Oracle never fails to surprise me. And this time it was running truncate statements against my tables.

Oracle introduced dbms_space_admin.drop_empty_segments in 11.2. This enables us to drop unused space within a segment. In other words, this procedure will defragment a table. But why would it issue truncate against the tables that has data ?
Let's get started with a test case

SQL> create table T1 (A number) ;

Since DEFERRED_SEGMENT_CREATION is set to TRUE, segment for T1 is never created.

SQL> create table T2 (A number);
SQL> insert into T2 values (1);
SQL> commit;

My insert statement has created a segment because of the insert.

SQL> create table T3 (A number) ;
SQL> insert into T3 values (1);
SQL> commit;
SQL> delete from T3;
SQL> commit;


SQL> select table_name, segment_created from user_tables ;
T1     NO
T2    YES
T3    YES

T3 is the table that is fragmented. I inserted a row and then deleted it. So, I have some free space below the HWM. And drop_empty_segments procedure should take care of this.
Now lets run the procedure

SQL> execute dbms_space_admin.drop_empty_segments('SCOTT') ;

10046 trace of the above procedure shows that Oracle recursively ran the following SQLs

LOCK TABLE T2 IN EXCLUSIVE MODE NOWAIT
TRUNCATE TABLE T2
LOCK TABLE T3 IN EXCLUSIVE MODE NOWAIT
TRUNCATE TABLE T3

It locked tables T2 and T3 in exclusive mode. and then ran truncates against them.
But why was a truncate issued against T2 ? This table has data in it. And not even eligible for segment drop.


SQL> select table_name, segment_created from user_tables ;
T1     NO
T2    YES
T3    NO

As you can see, We have reclaimed the free space in T3 and segment_created has been set to NO.
But surprisingly, truncate was issued against T2 as well. I ran a select on T2, data wasn't actually truncated.
As per bug notes, this truncate is actually a no-op truncate. This is a special form of truncate. This truncate first checks if the segment has rows or not and then decides whether to truncate the segment.
But it still holds an exclusive lock on T2 which can cause major performance bottleneck. If T2 is a huge table with thousands of partitions, Oracle might take too long to probe each of these partitions to check if a row exists or not. And while this probe is on, any other session trying to insert a row into this table would wait for enq- TM contention.
Therefore, I'd never suggest running this procedure. There are other better ways to defrag a table.

19c Multitenant RAC- Default service disables parallelism

We recently upgraded one of our RAC databases from 11.2 to 19.10 on Exadata. And to everyone's surprise, almost everything on the system...