In this post we will discuss how bloom filters work and how they can benefit specifically on an Exadata environment. We will also be looking at bloom filter min/max optimization that Oracle uses to benefit from storage indexes. Let's get started with a test case.
Database version : 18.8 on Exadata X7-2 machine
SQL> CREATE TABLE T1 ( event_name VARCHAR2(254) , partition_id NUMBER);
SQL> CREATE TABLE T2
(
partition_id NUMBER,
partition_name CHAR(254)
) STORAGE ( CELL_FLASH_CACHE NONE);
SQL> insert into T1 values ('str',1) ;
SQL> insert into T1 values ('str',2) ;
SQL> BEGIN
FOR i in 1..1000000
loop
insert into T2 values(i,'ABC');
end loop;
END;
/
SQL> commit;
-- Size of T2 is 296 MB. T1 would be few KBs with just two rows.
-- Run few dummy selects on T2 just to ensure that storage indexes have been created
-- Force direct path reads to avoid buffered reads
SQL> alter session set "_serial_direct_read"=ALWAYS ;
SQL> SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where
T1.event_name='str'
and T1.partition_id=T2.partition_id ;
-- Returns two rows
Plan hash value: 1144549688
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 822 | 421 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 822 | 421 (3)| 00:00:01 |
| 2 | JOIN FILTER CREATE | :BF0000 | 2 | 284 | 2 (0)| 00:00:01 |
|* 3 | TABLE ACCESS STORAGE FULL| T1 | 2 | 284 | 2 (0)| 00:00:01 |
| 4 | JOIN FILTER USE | :BF0000 | 887K| 227M| 417 (2)| 00:00:01 |
|* 5 | TABLE ACCESS STORAGE FULL| T2 | 887K| 227M| 417 (2)| 00:00:01 |
---------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
PX_JOIN_FILTER(@"SEL$1" "T2"@"SEL$1")
USE_HASH(@"SEL$1" "T2"@"SEL$1")
LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
FULL(@"SEL$1" "T2"@"SEL$1")
FULL(@"SEL$1" "T1"@"SEL$1")
OUTLINE_LEAF(@"SEL$1")
ALL_ROWS
DB_VERSION('18.1.0')
OPTIMIZER_FEATURES_ENABLE('18.1.0')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."PARTITION_ID"="T2"."PARTITION_ID")
3 - storage("T1"."EVENT_NAME"='str')
filter("T1"."EVENT_NAME"='str')
Here is the execution plan of this SQL.
1. Oracle first scans table T1.
2. During this scan it also builds a bloom bit array on the join column (partition_id).
3. This bloom filter along with join column's min max value is sent to the storage layer.
4. Now T2 is scanned. At this time Oracle uses the min max values to scan the storage indexes.
5. If storage index cannot be used, Oracle reads all records from T2 and passes it thru the bloom filter.
6. Only the 'probable' matching records are sent back to the compute nodes.
In the plan outline, you see this hint called PX_JOIN_FILTER. "PX" usually denotes some sort of parallelism but px_join_filter is a hint to force the bloom filter.
At step 2, Oracle creates a bloom filter on T1. Please read this excellent this Bloom Filter White Paper by Christian Antognini.
In our case, with just two records in T1, Bloom filter bit array would conceptually look like this.
Storage indexes consists of 1MB storage regions. They store the min and max of the column value for each of these regions. It can store up to 8 columns and with ESS 19c it can accommodate up to 24 columns. By looking at these indexes, Oracle can decide to skip the storage regions that do not satisfy the query.
Let's look at the SQL execution stats
Total physical reads for this query is 304250880 bytes.
Cell physical IO bytes saved by storage index is 30400510 bytes.
Cell physical IO interconnect bytes is just 34728 bytes.
As you can see, storage indexes were used in this execution. It eliminated 304 MB of disk IO. This is impressive. And only 34 KB of data was sent back from the cell server to the compute nodes. Elimination was possible because there is minimum overlap between T1 min-max (1-2) vs T2 min-max of (1-1000000).
Now I insert just one more record into T1. And run the same query again
SQL> insert into T1 values ('str',1000000) ;
SQL> commit ;
SQL> SELECT /*+ USE_HASH(T2) */ * FROM T1,T2 where
T1.event_name='str'
and T1.partition_id=T2.partition_id
-- Returns three rows rows
-- Execution Plan remains the same as above
So, now with three rows in T1, the bit vector would look like this.
Total physical reads for this query is 304250880 bytes.
Cell physical IO bytes saved by storage index is 0 bytes.
Cell physical IO interconnect bytes is just 151512 bytes.
151 KB of data was sent back from the cell server to the compute nodes. Disk IO savings was not possible but smart scans did take place. All the 1 million records were read by Oracle and passed thru the bloom filter and then only "probable" matching rows were sent back.
First query took advantage of both smart scans and storage indexes. While the second query benefitted only from smart scans.
BENEFITS of Bloom Filters on Exadata
1. Depending on the min-max values, queries can benefit from Storage Indexes, thereby reducing disk reads.
2. Since the bloom structure is sent to the storage server, queries benefit from smart scans. Thereby sending fewer rows back to the compute nodes.
3. On RAC environment, bloom filters on parallel queries will reduce the interconnect traffic and save CPU time because fewer rows are exchanged between slave sets
There is a hidden parameter _bloom_minmax_enabled. This controls the min max optimization. If this parameter is turned off, then Oracle would not be able to send the min-max of the join column and therefore storage indexes will not be used.