Sunday 10 May 2015

How on earth can a single row insert take an hour to complete ?

I came across this very intriguing issue recently. Customer reported that a single row insert into a cluster table was taking more than an hour to complete.

This post assumes that you have a basic understanding of Table Clusters. Let's understand how Oracle handles a insert into Cluster Tables. And that there is a serious design flaw here.

Here is a small test case to demonstrate the problem. 

Version : 11.2.0.4

Additional Info : Table reside in NON-ASSM managed Tablespace (Similar issues occur in ASSM managed tablespaces as well)


-- Create a Cluster. No size clause specified. This implies that we can have just one cluster key value per block


SQL> CREATE CLUSTER ABC_C1 (C1 number(5));

-- Create Cluster Index

SQL> CREATE INDEX ABC_C1_IX ON CLUSTER ABC_C1 ;

-- Create table

SQL> create table abc1 (c1 number(5),C2 CHAR(1000), C3 CHAR(1000)) cluster ABC_C1(c1);


-- Now INSERT 1000 rows ( 2 times) with Cluster key Value 1-1000

( ** This PL SQL block is executed two times **)


BEGIN
FOR i in 1..1000
LOOP
insert into abc1 values(i,'A','A');
END LOOP;
END;
/

/


SQL> commit;


-- Analyze the table

SQL> ANALYZE TABLE ABC1 COMPUTE STATISTICS;

SQL> SELECT BLOCKS,NUM_FREELIST_BLOCKS FROM USER_TABLES;


BLOCKS             NUM_FREELIST_BLOCKS

----------           ---------------------------
1000                             1000

Now we have 2,000 rows spanned over 1000 blocks. Each block having 2 rows for the same cluster key value.Each block has some free space in it because the average row length is 2k and block size is 8k. Therefore we have 1000 blocks in the FREELIST.

-- 

SQL> alter system flush buffer_cache;


-- Now insert a new cluster key Value

SQL> set autotrace traceonly exp stat
SQL>
SQL>
SQL> insert into abc1 values (12000,'H','H');

1 row created.


Statistics

----------------------------------------------------------
2 recursive calls
2019 db block gets
4 consistent gets
1021 physical reads
1961796 redo size
839 bytes sent via SQL*Net to client
796 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

As you can see, a single row insert has done 1021 physical reads via db file sequential read. This is insane! Oracle reads 1021 data blocks for a simple insert. In real world scenarios - Big tables may have millions of blocks in the FREELIST. This can cause major performance issues where the number of blocks in the FREELIST is huge.

How Oracle handles this insert ?

To insert a new cluster key , Oracle must find free block in the segment.To find the free space, current behaviour is to scan through the ENTIRE free list. In this case, we have 1000 free blocks. Each of the blocks is read via "db file sequential read", but inserts cannot be performed in either of those blocks because we can have only 1 distinct cluster key value in one block. So, we now scanned through 1000 blocks and none of the blocks are eligible for an insert.  This is where performance takes a hit. Moreover, Oracle also unlinks all these 1000 blocks from the free list because none of them could accommodate the new insert.

How should Oracle actually handle this ?

For a new cluster key, Look into the index to figure out if value being inserted is a new key or not. If yes ( Like in our case) - Do not bother to look into the free list because those free blocks are anyways not eligible for inserts. So, for new cluster key values Oracle should directly allocate a new block.

Update

Oracle has fixed this issue. 


Bug 18938517 - Single row insert into a cluster table does a lot of "db file sequential read"

This fix is included in 12.2.0.1 and above. For lower versions, you need to request for a backport. 


This fix is not enabled by default. You need to set event 45051 to enable this fix.


SQL> alter system set events "45051 trace name context forever, level 50";


Level N is the number of blocks that Oracle probes in the free list before allocating a new block. Basically, this fix forces Oracle to not to scan the entire free list blocks. Read the first 50 blocks and see if it is eligible for an insert. Otherwise allocate a new block for the insert. 


You could choose any number between 1-10000. 

19c Multitenant RAC- Default service disables parallelism

We recently upgraded one of our RAC databases from 11.2 to 19.10 on Exadata. And to everyone's surprise, almost everything on the system...