Vishal's blog - Oracle Database Performance

Thursday, 29 May 2025

Bloom Partition Pruning In Oracle Vs Postgres

In this blog post let's compare Oracle and Postgres optimizer transformation features.

For this test case I created two small tables and see how Oracle and Postgres is able to prune partitions at run time.

Postgres Version: 17.5

nodpr=# CREATE TABLE T1 ( event_name VARCHAR , partition_id INTEGER);

CREATE TABLE

nodpr=# CREATE TABLE t2 (

nodpr(# partition_id INTEGER,

nodpr(# partition_name VARCHAR(254)

nodpr(# )

nodpr-# PARTITION BY HASH (partition_id);

CREATE TABLE

nodpr=# CREATE TABLE t2_p1 PARTITION OF t2 FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE

nodpr=# CREATE TABLE t2_p2 PARTITION OF t2 FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE

nodpr=# CREATE TABLE t2_p3 PARTITION OF t2 FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE

nodpr=#

nodpr=# CREATE TABLE t2_p4 PARTITION OF t2 FOR VALUES WITH (MODULUS 4, REMAINDER 3);

CREATE TABLET

-- Insert some rows

nodpr=# insert into T1 values('str1',4);

INSERT 0 1

nodpr=#

nodpr=# insert into T2 values(4,'ABC');

INSERT 0 1

nodpr=# insert into T2 values(5,'ABC');

INSERT 0 1

nodpr=# insert into T2 values(6,'ABC');

INSERT 0 1

nodpr=# insert into T2 values(7,'ABC');

INSERT 0 1

nodpr=# explain analyze

nodpr-# SELECT *

nodpr-# FROM t1

nodpr-# JOIN t2 ON t1.partition_id = t2.partition_id

nodpr-# WHERE t1.event_name LIKE 'str%';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

Hash Join (cost=25.95..76.62 rows=17 width=556) (actual time=0.158..0.164 rows=1 loops=1)

Hash Cond: (t2.partition_id = t1.partition_id)

-> Append (cost=0.00..48.40 rows=560 width=520) (actual time=0.037..0.054 rows=4 loops=1)

-> Seq Scan on t2_p1 t2_1 (cost=0.00..11.40 rows=140 width=520) (actual time=0.020..0.020 rows=0 loops=1)

-> Seq Scan on t2_p2 t2_2 (cost=0.00..11.40 rows=140 width=520) (actual time=0.015..0.015 rows=1 loops=1)

-> Seq Scan on t2_p3 t2_3 (cost=0.00..11.40 rows=140 width=520) (actual time=0.006..0.006 rows=0 loops=1)

-> Seq Scan on t2_p4 t2_4 (cost=0.00..11.40 rows=140 width=520) (actual time=0.007..0.008 rows=3 loops=1)

-> Hash (cost=25.88..25.88 rows=6 width=36) (actual time=0.091..0.091 rows=1 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t1 (cost=0.00..25.88 rows=6 width=36) (actual time=0.051..0.053 rows=1 loops=1)

Filter: ((event_name)::text ~~ 'str%'::text)

Planning Time: 0.295 ms

Execution Time: 0.218 ms

(13 rows)

Value of 4 in T2 resides in t2_p4. But this SELECT query scans through all the partitions of T2. As you can see, Postgres could not do a partition pruning in this case.

Now lets do this same test case on Oracle 19.22 and see the execution plan

EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where
T1.event_name like 'str%'
and T1.partition_id=T2.partition_id

Plan hash value: 2384829105

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 976K| 11158 (1)| 00:03:33 | | |
|* 1 | HASH JOIN | | 5000 | 976K| 11158 (1)| 00:03:33 | | |
| 2 | JOIN FILTER CREATE | :BF0001 | 1 | 100 | 4 (0)| 00:00:01 | | |
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1 | 100 | 4 (0)| 00:00:01 | | |
|* 4 | TABLE ACCESS STORAGE FULL| T1 | 1 | 100 | 4 (0)| 00:00:01 | | |
| 5 | JOIN FILTER USE | :BF0001 | 100K| 9765K| 11154 (1)| 00:03:32 | | |
| 6 | PARTITION HASH JOIN-FILTER| | 100K| 9765K| 11154 (1)| 00:03:32 |:BF0000|:BF0000|
|* 7 | TABLE ACCESS STORAGE FULL| T2 | 100K| 9765K| 11154 (1)| 00:03:32 |:BF0000|:BF0000|
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")
4 - storage("T1"."EVENT_NAME" LIKE 'str%')
filter("T1"."EVENT_NAME" LIKE 'str%')
7 - storage(SYS_OP_BLOOM_FILTER(:BF0001,"T2"."PARTITION_ID"))
filter(SYS_OP_BLOOM_FILTER(:BF0001,"T2"."PARTITION_ID"))

You can see SYS_OP_BLOOM_FILTER being applied on the predicate.

Here a bloom filter is created at O/P 3. And at O/P 7, Pstart and Pstop shows :BF0000.

Based on this bloom filter partition pruning occurs on T2 and therefore, only the partitions containing the relevant data are scanned. This is called Bloom Pruning. And if you have large datasets with thousands of partitions, this feature could massively improve the performance of the complex analytical SQLs.

Monday, 20 January 2025

Bloom Min-Max optimization on Exadata

In this post we will discuss how bloom filters work and how they can benefit specifically on an Exadata environment. We will also be looking at bloom filter min/max optimization that Oracle uses to benefit from storage indexes. Let's get started with a test case.

Database version : 18.8 on Exadata X7-2 machine

SQL> CREATE TABLE T1 ( event_name VARCHAR2(254) , partition_id NUMBER);

SQL> CREATE TABLE T2

(

partition_id NUMBER,

partition_name CHAR(254)

) STORAGE ( CELL_FLASH_CACHE NONE);

SQL> insert into T1 values ('str',1) ;

SQL> insert into T1 values ('str',2) ;

SQL> BEGIN

FOR i in 1..1000000

loop

insert into T2 values(i,'ABC');

end loop;

END;

SQL> commit;

-- Size of T2 is 296 MB. T1 would be few KBs with just two rows.

-- Run few dummy selects on T2 just to ensure that storage indexes have been created

-- Force direct path reads to avoid buffered reads

SQL> alter session set "_serial_direct_read"=ALWAYS ;

SQL> SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where

T1.event_name='str'

and T1.partition_id=T2.partition_id ;

-- Returns two rows

Plan hash value: 1144549688

---------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 2 | 822 | 421 (3)| 00:00:01 |

|* 1 | HASH JOIN | | 2 | 822 | 421 (3)| 00:00:01 |

| 2 | JOIN FILTER CREATE | :BF0000 | 2 | 284 | 2 (0)| 00:00:01 |

|* 3 | TABLE ACCESS STORAGE FULL| T1 | 2 | 284 | 2 (0)| 00:00:01 |

| 4 | JOIN FILTER USE | :BF0000 | 887K| 227M| 417 (2)| 00:00:01 |

|* 5 | TABLE ACCESS STORAGE FULL| T2 | 887K| 227M| 417 (2)| 00:00:01 |

---------------------------------------------------------------------------------------

Outline Data

-------------

/*+

BEGIN_OUTLINE_DATA

PX_JOIN_FILTER(@"SEL$1" "T2"@"SEL$1")

USE_HASH(@"SEL$1" "T2"@"SEL$1")

LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")

FULL(@"SEL$1" "T2"@"SEL$1")

FULL(@"SEL$1" "T1"@"SEL$1")

OUTLINE_LEAF(@"SEL$1")

ALL_ROWS

DB_VERSION('18.1.0')

OPTIMIZER_FEATURES_ENABLE('18.1.0')

IGNORE_OPTIM_EMBEDDED_HINTS

END_OUTLINE_DATA

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")

3 - storage("T1"."EVENT_NAME"='str')

filter("T1"."EVENT_NAME"='str')

Here is the execution plan of this SQL.

1. Oracle first scans table T1.

2. During this scan it also builds a bloom bit array on the join column (partition_id).

3. This bloom filter along with join column's min max value is sent to the storage layer.

4. Now T2 is scanned. At this time Oracle uses the min max values to scan the storage indexes.

5. If storage index cannot be used, Oracle reads all records from T2 and passes it thru the bloom filter.

6. Only the 'probable' matching records are sent back to the compute nodes.

In the plan outline, you see this hint called PX_JOIN_FILTER. "PX" usually denotes some sort of parallelism but px_join_filter is a hint to force the bloom filter.

At step 2, Oracle creates a bloom filter on T1. Please read this excellent this Bloom Filter White Paper by Christian Antognini.

In our case, with just two records in T1, Bloom filter bit array would conceptually look like this.

This bloom filter consists of bit vector of length 50. It uses 3 hash functions. When this SQL is executed Oracle sends this bit array and min-max of partition_id ( 1-2) to the storage server.

Storage indexes consists of 1MB storage regions. They store the min and max of the column value for each of these regions. It can store up to 8 columns and with ESS 19c it can accommodate up to 24 columns. By looking at these indexes, Oracle can decide to skip the storage regions that do not satisfy the query.

Let's look at the SQL execution stats

Total physical reads for this query is 304250880 bytes.

Cell physical IO bytes saved by storage index is 30400510 bytes.

Cell physical IO interconnect bytes is just 34728 bytes.

As you can see, storage indexes were used in this execution. It eliminated 304 MB of disk IO. This is impressive. And only 34 KB of data was sent back from the cell server to the compute nodes. Elimination was possible because there is minimum overlap between T1 min-max (1-2) vs T2 min-max of (1-1000000).

Now I insert just one more record into T1. And run the same query again

SQL> insert into T1 values ('str',1000000) ;

SQL> commit ;

SQL> SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where

T1.event_name='str'

and T1.partition_id=T2.partition_id

-- Returns three rows rows

-- Execution Plan remains the same as above

So, now with three rows in T1, the bit vector would look like this.

This time the min-max of T1 is 1-1000000. Even though we have just 3 records, min max is highly skewed. Let's look at the SQL stats for this execution ..

Total physical reads for this query is 304250880 bytes.

Cell physical IO bytes saved by storage index is 0 bytes.

Cell physical IO interconnect bytes is just 151512 bytes.

Storage indexes were not used during this execution. With just one additional row, 'cell physical IO saved by storage index' went down from 304 MB to 0. Therefore, Oracle had to perform 304 MB of disk IO this time. IO elimination was just not possible here because min-max of T1 overlaps min-max of T2.

151 KB of data was sent back from the cell server to the compute nodes. Disk IO savings was not possible but smart scans did take place. All the 1 million records were read by Oracle and passed thru the bloom filter and then only "probable" matching rows were sent back.

First query took advantage of both smart scans and storage indexes. While the second query benefitted only from smart scans.

BENEFITS of Bloom Filters on Exadata

1. Depending on the min-max values, queries can benefit from Storage Indexes, thereby reducing disk reads.

2. Since the bloom structure is sent to the storage server, queries benefit from smart scans. Thereby sending fewer rows back to the compute nodes.

3. On RAC environment, bloom filters on parallel queries will reduce the interconnect traffic and save CPU time because fewer rows are exchanged between slave sets

There is a hidden parameter _bloom_minmax_enabled. This controls the min max optimization. If this parameter is turned off, then Oracle would not be able to send the min-max of the join column and therefore storage indexes will not be used.

Wednesday, 15 January 2025

19c Multitenant RAC- Default service disables parallelism

We recently upgraded one of our RAC databases from 11.2 to 19.10 on Exadata. And to everyone's surprise, almost everything on the system started running slowly. Further investigation revealed that none of the queries were using parallelism, even though we hard code the parallel hint in the SQL itself. I verified all parallel related parameters and they were all good.

If my PDB name is ORCL, Oracle creates a default service called ORCL that runs on all the RAC nodes. If you connect to this default service, parallelism does not kick in. I have no idea why Oracle implemented this restriction ( or call it a feature ? ). Going through the Oracle documents, it reads that the default service name is reserved for internal maintenance tasks and should not be used by applications. However, the document doesn't state that there is a restriction on using parallelism. You can still connect to this default service, just that some of the features would not work. Parallelism is just one of them we uncovered.

A 10053 trace for one of the offending SQLs does not reveal much. It just says that the parallel_execution_enabled has been set to FALSE with no obvious reasons.

*************************************
PARAMETERS WITH ALTERED VALUES
******************************
Compilation Environment Dump
parallel_execution_enabled = false
_smm_px_max_size_static = 12582912 KB
_unnest_subquery = false
_optimizer_skip_scan_enabled = false
_fix_control_key = 397299019
parallel_hinted = degree

There is an Oracle document that does mention about this behavior

Parallel Query Runs in Serial ON RAC when Using Default SERVICE_NAME (Doc ID 2603641.1)

The document also mentions this workaround

exec dbms_service.start_service('PDB_NAME');
alter system register;

This seems a little odd to me as the service is already up and running and is registered with the listener. Otherwise, how would I even connect to the database. Nevertheless, I tried this workaround and parallel slaves came back to life. But this workaround is not reliable, when I restart the PDB this setting is lost and parallelism shuts down again. In some cases I noticed that even without a restart, parallelism stops working out of nowhere.

So, although there is a workaround to get this working, I wouldn't recommend using it because it is unreliable and you never know what other features Oracle chose to disable.

So, you must create a new service and ask your application teams to edit their connection strings or the local TNS copy to point to this new service. Now that's a huge task :)

Good luck with your 19c upgrades. Many more surprises surely awaits ...

Wednesday, 24 July 2024

DBMS_OPTIM_BUNDLE : Optimizer Fixes in DBRU

Starting 12.2, Optimizer fixes included in the RUs are disabled by default.

To get the complete list of fixes - Oracle has introduced a package called dbms_optim_bundle. Since the RUs are cumulative in nature, this package would list the optimizer fixes that Oracle included in every RU.

Here is the output from my 19.12 database. I can see that there are 13 fixes in 19.12 ( call it a RU 12 or July 2021 bundle), 45 fixes on 19.11 and so on. In total there are 104 fixes and all of them are disabled by default.

SQL> exec dbms_optim_bundle.getBugsforBundle(210720);

19.4.0.0.190719DBRU:

Bug: 29331066, fix_controls: 29331066

19.5.0.0.191015DBRU:

Bug: 28965084, fix_controls: 28965084

Bug: 28776811, fix_controls: 28776811

Bug: 28498976, fix_controls: 28498976

Bug: 28567417, fix_controls: 28567417

Bug: 28558645, fix_controls: 28558645

Bug: 29132869, fix_controls: 29132869

Bug: 29450812, fix_controls: 29450812

19.7.0.0.200414DBRU:

Bug: 29687220, fix_controls: 29687220

Bug: 29939400, fix_controls: 29939400

Bug: 30232638, fix_controls: 30232638

Bug: 30001331, fix_controls: 30001331

19.8.0.0.200714DBRU:

Bug: 29304314, fix_controls: 29304314

Bug: 29930457, fix_controls: 29930457

Bug: 30519188, fix_controls: 30028663

Bug: 28144569, fix_controls: 28144569

Bug: 28776431, fix_controls: 28776431

19.9.0.0.201020DBRU:

Bug: 27261477, fix_controls: 27261477, 31069997, 31077481

Bug: 28602253, fix_controls: 28602253

Bug: 31486557, fix_controls: 29653132

Bug: 29937655, fix_controls: 29937655

Bug: 30347410, fix_controls: 30347410

Bug: 30602828, fix_controls: 30602828

Bug: 30896685, fix_controls: 30896685

19.10.0.0.210119DBRU:

Bug: 29487407, fix_controls: 29487407

Bug: 30998035, fix_controls: 30998035

Bug: 30786641, fix_controls: 30786641

Bug: 31444353, fix_controls: 31444353

Bug: 30486896, fix_controls: 30486896

Bug: 28999046, fix_controls: 28999046

Bug: 30902655, fix_controls: 30902655

Bug: 30681521, fix_controls: 30681521

Bug: 29302565, fix_controls: 29302565

Bug: 30972817, fix_controls: 30972817

Bug: 30222669, fix_controls: 30222669

Bug: 31668694, fix_controls: 31668694

Bug: 31001490, fix_controls: 31001490

Bug: 30198239, fix_controls: 30198239

Bug: 30980115, fix_controls: 30980115

Bug: 30616738, fix_controls: 30616738

Bug: 31895670, fix_controls: 31895670

Bug: 19138896, fix_controls: 19138896

Bug: 31376708, fix_controls: 31670824

Bug: 30564898, fix_controls: 9876287, 30564898

Bug: 32234161, fix_controls: 32075777

Bug: 30842277, fix_controls: 30570982

19.11.0.0.210420DBRU:

Bug: 32037237, fix_controls: 32037237

Bug: 30927440, fix_controls: 30927440

Bug: 31788104, fix_controls: 30822446

Bug: 24561942, fix_controls: 24561942

Bug: 31625959, fix_controls: 31625959

Bug: 31976303, fix_controls: 31579233

Bug: 29696242, fix_controls: 29696242

Bug: 31626438, fix_controls: 31626438

Bug: 30228422, fix_controls: 30228422

Bug: 32122574, fix_controls: 17295505

Bug: 29725425, fix_controls: 29725425

Bug: 30618230, fix_controls: 30618230

Bug: 30008456, fix_controls: 30008456

Bug: 30537403, fix_controls: 30537403

Bug: 30235878, fix_controls: 30235878

Bug: 30646077, fix_controls: 30646077

Bug: 29657973, fix_controls: 29657973

Bug: 30527198, fix_controls: 29712727

Bug: 20922160, fix_controls: 20922160

Bug: 30006705, fix_controls: 30006705

Bug: 29463553, fix_controls: 29463553

Bug: 30751171, fix_controls: 30751171

Bug: 9, fix_controls: 31009032

Bug: 30207519, fix_controls: 30063629, 30207519

Bug: 31517502, fix_controls: 31517502

Bug: 30617002, fix_controls: 30617002

Bug: 30483217, fix_controls: 30483217

Bug: 30235691, fix_controls: 30235691

Bug: 30568514, fix_controls: 30568514

Bug: 28414968, fix_controls: 28414968

Bug: 32014520, fix_controls: 32014520

Bug: 30249927, fix_controls: 30249927

Bug: 31580374, fix_controls: 31580374

Bug: 29590666, fix_controls: 29590666

Bug: 29435966, fix_controls: 29435966

Bug: 29867728, fix_controls: 28173995, 29867728

Bug: 30776676, fix_controls: 30776676

Bug: 26577716, fix_controls: 26577716

Bug: 30470947, fix_controls: 30470947

Bug: 30979701, fix_controls: 30979701

Bug: 31435308, fix_controls: 30483184, 31001295

Bug: 31191224, fix_controls: 31191224

Bug: 31974424, fix_controls: 31974424

Bug: 29385774, fix_controls: 29385774

Bug: 28234255, fix_controls: 28234255

19.12.0.0.210720DBRU:

Bug: 31459242, fix_controls: 31459242

Bug: 31082719, fix_controls: 31082719

Bug: 28708585, fix_controls: 28708585

Bug: 31821701, fix_controls: 31821701

Bug: 32107621, fix_controls: 32107621

Bug: 26758837, fix_controls: 26758837

Bug: 31558194, fix_controls: 31558194

Bug: 30781970, fix_controls: 30781970

Bug: 30142527, fix_controls: 30142527

Bug: 31143146, fix_controls: 31143146

Bug: 31961578, fix_controls: 31961578

Bug: 31496840, fix_controls: 31496840

Bug: 22387320, fix_controls: 22387320

To enable these fixes, you run

SQL> execute dbms_optim_bundle.enable_optim_fixes('ON','BOTH', 'YES')

This will enable all the fixes mentioned above. With dbms_optim_bundle, you do not have an option to enable a particular fix. However, if you want to enable a particular fix, you could do it manually through the _fix_control parameter.

SQL> alter system set "_fix_control" = '22387320 : 1'

You can also find the current values in the view v$system_fix_control.

Why are these fixes disabled by default and should you really enable them on your system?

And the answer is - It depends !

These are optimizer fixes and can cause changes in execution plans. And these changes will most likely be "good" for you because Oracle has fixed a bug in your code. However, there is a rare possibility of a regression bug due to these fixes. Therefore, all changes needs to go through a comprehensive application testing cycle.

And Oracle wants Customers to have a consistent performance experience between different RUs. If a customer moves from 19.4 to 19.5 to 19.6 and so on- Oracle doesn't want you to have mixed experience in terms of overall application performance. This could be the reason why they are turned off in the first place.

Imagine running into one of these bugs. You will have to raise a SR, upload tons of trace files, escalate the SR with Oracle .... and after few days or weeks you find out that the fix is already included - just that it is not enabled. This is gonna be really frustrating.

If yours is a new database installation or if you are upgrading from a previous version say 11g or 12c I would suggest that you enable these fixes. There is absolutely no point in leaving them disabled.

But if you are moving between different RUs - then it is up to you to decide.

Vishal's blog - Oracle Database Performance

Thursday, 29 May 2025

Bloom Partition Pruning In Oracle Vs Postgres

Monday, 20 January 2025

Bloom Min-Max optimization on Exadata

Wednesday, 15 January 2025

19c Multitenant RAC- Default service disables parallelism

exec dbms_service.start_service('PDB_NAME');
alter system register;

Wednesday, 24 July 2024

DBMS_OPTIM_BUNDLE : Optimizer Fixes in DBRU

Why are these fixes disabled by default and should you really enable them on your system?

Bloom Partition Pruning In Oracle Vs Postgres

Blogroll

DISCLAIMER

Thursday, 29 May 2025

Bloom Partition Pruning In Oracle Vs Postgres

Monday, 20 January 2025

Bloom Min-Max optimization on Exadata

Wednesday, 15 January 2025

19c Multitenant RAC- Default service disables parallelism

exec dbms_service.start_service('PDB_NAME');alter system register;

Wednesday, 24 July 2024

DBMS_OPTIM_BUNDLE : Optimizer Fixes in DBRU

Why are these fixes disabled by default and should you really enable them on your system?

Bloom Partition Pruning In Oracle Vs Postgres

exec dbms_service.start_service('PDB_NAME');
alter system register;