Partition fast split

After an upgrade to Oracle 11gR2, a house keeping job that adds and removes partitions from a handful of tables began to fail. The failure remained unnoticed due to a bug: the job finished successfully although nothing was done. That caused the partition that holds values smaller than MAXVALUE to grow too large. This post is about the approach to fix the problem. Also, you will notice along the post that I am not an English native speaker.

The most affected table is partioned by range and sub partitioned by hash. A new partition to hold a single day’s worth of data is created every day. Each partition has 128 sub partitions. A prototype of the table would look like this:

CREATE TABLE USAGE_PART
(
  ENT_DATE      DATE       NOT NULL,
  CUSTOMER_ID   NUMBER     NOT NULL,
  USG_AMOUNT    FLOAT(126) NOT NULL
)
LOGGING 
NOCACHE
NOPARALLEL
STORAGE (INITIAL 1m NEXT 1m MAXEXTENTS UNLIMITED FREELISTS 12) INITRANS 12
PARTITION BY RANGE (ENT_DATE)
subpartition by hash (CUSTOMER_ID)
subpartitions 128 (
PARTITION p20120510 values less than (to_date ('20120511', 'yyyymmdd')),
PARTITION p20120511 values less than (to_date ('20120512', 'yyyymmdd')),
PARTITION pmax values less than (maxvalue));

create index iUSAGE_PART on USAGE_PART (
  ENT_DATE,
  CUSTOMER_ID) local;

The job splits the partition PMAX at day D, using as high value D+2 and naming the partition D+1. For instance, on Jan 10th the job created a partition named Jan 11th with high value Jan 12th 0:00 am. This way partition PMAX is always empty and the split process takes place pretty fast. The job would also drop partitions older than 2 months.

Due to the bug, the split stopped from working and all inserts were directed to the PMAX partition. We noted the issue when the partition size was already 500GB since the growth rate is about 80GB per day.

So can we fix the issue? After the job’s silent failure was worked, one choice could be just let it run as always and perform the splits as usual. The documentation recommends some verification in order to allow the “fast split” to happen: Oracle docs.

Oracle metalink adds a further step and asks the DBAs to gather table stats before the fast split operation.

We decided to test with a table alike, which was also affected by the job. The partition scheme is exactly the same. However it grows at a smaller rate and generated a 20GB partition. Since this is a manageable partition size, datapump was employed to move the data to a test environment. The split was then tested as follows:

ALTER TABLE USAGE_SUMMARY
SPLIT PARTITION PMAX AT
(TO_DATE('20120525','YYYYMMDD')) INTO (PARTITION PART_20120524, PARTITION PMAX);

I expected the partitioning to take about a few minutes, but it took just a few seconds. Obviously a fast split took place. One prerequisite to the fast split is that one of the resulting partitions must be empty. And in order to find this out, a table scan is needed. Therefore I decided to have a look at the 10046 activated prior to the split start to understand the “very” fast split:

alter session set events '10046 trace name context forever, level 12';

ALTER TABLE USAGE_SUMMARY
SPLIT PARTITION PMAX AT
(TO_DATE('20120525','YYYYMMDD')) INTO (PARTITION PART_20120524, PARTITION PMAX;

By studying the raw trace, I learned about – in Oracle 11.2.0.2 – two interesting select statements:

select /*+ FIRST_ROWS(1) PARALLEL("USAGE_SUMMARY", 1) */ 1 
from "TTT"."USAGE_SUMMARY" PARTITION ("PMAX")  
where (((( "ENT_DATE" < TO_DATE(' 2012-05-25 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') OR "ENT_DATE" IS NULL)))) 
and rownum < 2
select /*+ FIRST_ROWS(1) PARALLEL("USAGE_SUMMARY", 1) */ 1 
from "TTT"."USAGE_SUMMARY" PARTITION ("PMAX")  
where (((( "ENT_DATE" >= TO_DATE(' 2012-05-25 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') OR "ENT_DATE" IS NULL)))) 
and rownum < 2

The first statement is promptly satisfied since all rows match the condition. Remember we split with tomorrow’s date, so all existing dates are supposed to be smaller than the high value date. I ran the explain plan for the second sql statement and realized the index on ENT_DATE was being used. So my first question – how did Oracle determine there would be an empty partition so fast – just got answered.

Just to make sure, I dropped the old USAGE_SUMMARY and created another one without the index. Oracle had no choice but to run the full table scan. This second time the split of the 20GB partition took 15 minutes, just like the way I would expect. The db file scattered read events logged in the raw trace show the full table scan being performed.

WAIT #4574749456: nam='db file scattered read' ela= 24941 file#=10 block#=2738818 blocks=64 obj#=34761 tim=3889945154282
WAIT #4574749456: nam='db file scattered read' ela= 50718 file#=10 block#=2738882 blocks=62 obj#=34761 tim=3889945206059
WAIT #4574749456: nam='Disk file operations I/O' ela= 7 FileOperation=2 fileno=4 filetype=2 obj#=34761 tim=3889945206947
WAIT #4574749456: nam='db file scattered read' ela= 86397 file#=4 block#=2735874 blocks=64 obj#=34761 tim=3889945293602
WAIT #4574749456: nam='db file scattered read' ela= 14544 file#=4 block#=2735938 blocks=62 obj#=34761 tim=3889945309178
WAIT #4574749456: nam='Disk file operations I/O' ela= 8 FileOperation=2 fileno=5 filetype=2 obj#=34761 tim=3889945310077
WAIT #4574749456: nam='db file scattered read' ela= 51182 file#=5 block#=2732162 blocks=64 obj#=34761 tim=3889945361514
WAIT #4574749456: nam='db file scattered read' ela= 7870 file#=5 block#=2732226 blocks=62 obj#=34761 tim=3889945370426
WAIT #4574749456: nam='Disk file operations I/O' ela= 7 FileOperation=2 fileno=6 filetype=2 obj#=34761 tim=3889945371246
WAIT #4574749456: nam='db file scattered read' ela= 34371 file#=6 block#=2730114 blocks=64 obj#=34761 tim=3889945405859

It is possible to make sure that a fast split happened because the pair DBA_OBJECTS.OBJECT_ID, DBA_OBJECTS.DATA_OBJECT_ID remained for the existing partitions. So, data was not moved around but it took about 15 minutes. What about our large table USAGE_PART, what would be the behavior?

So we ran similar selects for the big table:

select /*+ FIRST_ROWS(1) PARALLEL("USAGE_PART", 1) */ 1 
from "TTT"."USAGE_PART" PARTITION ("PMAX")  
where (((( "ENT_DATE" < TO_DATE(' 2012-05-25 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') OR "ENT_DATE" IS NULL)))) 
and rownum < 2
select /*+ FIRST_ROWS(1) PARALLEL("USAGE_PART", 1) */ 1 
from "TTT"."USAGE_PART" PARTITION ("PMAX")  
where (((( "ENT_DATE" >= TO_DATE(' 2012-05-25 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') OR "ENT_DATE" IS NULL)))) 
and rownum < 2

As expected, the first select completed pretty fast since all rows satisfy the where clause, but the second select got hung. By taking the explain plan, we noticed that Oracle CBO was not using the index. At that time, after searching for root cause, running imports, exports, tests and evaluating the situation the partition was already 1TB large. Gathering stats takes a good time and impacts production. And we wanted to be sure the CBO would pick the best execution plan because the application can afford almost no downtime. Well, we would be 100% sure only during the night with the clock ticking on.

Therefore for the larger table we decided to go with a different strategy:
1. Created a table USAGE_TEMP hash partitioned, 128 partitions, based on USAGE_PART table.
2. The next step was to perform an exchange partition between PMAX and the table USAGE_TEMP. This way we removed the giant partition from USAGE_PART and generated an empty PMAX, while preserving the data.
3. Then, we split PMAX in PMAX and PART_20120524. As result, we get an empty PMAX and the partition PART_20120524, also empty.
4. Finally we reintroduce the data thru a second exchange partition. This time we replace the empty partition PART_20120524 with the large data.
5. Process completed. We’ve got a large partition PART_20120524 that will be removed automatically in 2 months and an empty PMAX ready to be split in D+1.

1. Create temp table & index – must remember the index otherwise you will have to rebuild them.

CREATE TABLE USAGE_TEMP
(
  ENT_DATE      DATE       NOT NULL,
  CUSTOMER_ID   NUMBER     NOT NULL,
  USG_AMOUNT    FLOAT(126) NOT NULL
)
LOGGING 
NOCACHE
NOPARALLEL
STORAGE (INITIAL 1m NEXT 1m MAXEXTENTS UNLIMITED FREELISTS 12) INITRANS 12
partition by hash (CUSTOMER_ID)
partitions 128;

create index iUSAGE_TEMP on USAGE_TEMP (
  ENT_DATE,
  CUSTOMER_ID) local;

2. Extract PMAX

ALTER TABLE USAGE_PART
EXCHANGE PARTITION PMAX
WITH TABLE USAGE_TEMP INCLUDING INDEXES
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES;

3. Split new empty PMAX

ALTER TABLE USAGE_PART
SPLIT PARTITION PMAX AT
(TO_DATE('20120525','YYYYMMDD')) INTO (PARTITION PART_20120524, PARTITION PMAX);

4. Swap temp table back into original table as partition PART_20120524

ALTER TABLE USAGE_PART
EXCHANGE PARTITION PART_20120524
WITH TABLE USAGE_TEMP INCLUDING INDEXES
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES;
This entry was posted in adm, English. Bookmark the permalink.

Leave a comment