首页 > 数据库 > PERFORMING MASSIVE DELETES AS BULK INSERT

PERFORMING MASSIVE DELETES AS BULK INSERT

2006年6月3日 6,687 views 发表评论 阅读评论

Performing Massive DELETEs as Bulk INSERT


The following steps implement the technique of implementing DELETEs as direct-path INSERTs:

  1. Create a table (TestData_X) with the identical structure as the TestData table in the previous section but with no primary key, alternate key(s), or foreign key(s) constraints (NOLOGGING option).
  2. Insert data that you need to be keep into the newly created table using direct-path INSERT.
  3. Create indexes on the new table with NOLOGGING PARALLEL options.
  4. Create constraints on the new table. Use the ENABLE NOVALIDATE option for referential integrity and check constraints.
  5. Swap the TestData and TestData_X tables: rename TestData to TestData_T, rename TestData_X to TestData.
  6. Restore the original logging settings.
  7. Verify the results and drop the TestData_T table.
Optimize the Massive DELETE Operation in Oracle, Part 1
By performing DELETEs as INSERTs, you can complete massive DELETE operations in your Oracle database without having to pay the heavy performance overhead.
by Boris Milrud
—————————————————————————————————————————————————-
massive DELETE operation deletes millions of rows from a table with indexes and constraints. This operation is database intensive and time consuming, mainly because it forces the database to generate and save to disk significant amounts (possibly gigabytes) of redo and undo data.You can perform massive DELETEs as bulk INSERT operations: instead of removing data you no longer need, you insert data you want to keep. The key to this option is performing it efficiently with minimum logging by using direct-path INSERT. Part 1 of this article series compares direct-path INSERT and regular INSERT. It also describes the technique of performing DELETEs as INSERTs. Part 2 will discuss the packaged APIs that implement the bulk INSERT technique.

How do I complete a massive DELETE operation in my Oracle database without having to pay the heavy performance overhead?

Perform the massive DELETE operation as a direct-path (direct-load) INSERT (INSERT with APPEND hint) into a table whose logging parameter is set to NOLOGGING. This operation will complete significantly faster than DELETE and with minimum logging, but you have to take a backup afterwards to establish a new baseline.

Why INSERT Could Be Faster Than DELETE (or UPDATE)
Direct-path INSERT is a special database operation. Like SQL*Loader, it writes data directly to the database files, bypassing the buffer cache. It does this with minimum logging, recording only data dictionary changes. The logic behind this method is that because the data files are already up-to-date when an instance failure occurs, logging isn’t necessary.

The two distinct cases in which direct-path INSERT is important are when:

  1. The database is in noarchivelog mode. Media recovery is not possible, and you don’t need redo data for that either.
  2. The database is in archivelog mode. It logs redo blocks for media recovery by default. However, if you explicitly set a table to NOLOGGING mode, the database won’t log redo blocks.

Therefore, with the direct-path INSERT when the database is in noarchivelog mode or when it is in archivelog mode and the table is in NOLOGGING mode, it performs only minimum redo logging—to protect the data dictionary.

Page 1 of 4
Need Proof? Generate Test Data
To demonstrate the difference between direct-path and regular INSERTs, create test data. First, create a table named TestData with half a million rows, a primary key, one non-unique index, and a check constraint. You will use this table in all of the article’s code and performance tests. It employs data from the all_objects view as its source.

First, create the table with the following code:

 drop table TestData;  create table TestData (    objectId     integer      not null,    objectName   varchar2(30) not null,    objectType   varchar2(30) not null,    runId        integer      not null,    createDate   date                 ,    status       varchar2(7)          ,    description  varchar2(255) ) nologging; 

Next, populate it with data using the following block (Since the all_objects view (Oracle 9.2) has 20K+ rows, you need to do the insert 25 times to total 500K+ rows in the TestData table):

 declare   vCount pls_integer := 25; begin    for nIndex in 1..25    loop       insert /*+ APPEND */       into TestData       (          objectId,          objectName,          objectType,          runId,          createDate,          status,          description       )       select object_id,              object_name,              object_type,              nIndex,              created,              status,              object_name         from all_objects;                commit;    end loop; end; / 

Check the rows count:

 select count(*)    from TestData;    COUNT(*) ----------     547875 

Add the primary key on the objectId and runId columns, one non-unique index on the objectName and objectType columns, and one check constraint on the runId:

 alter table TestData add constraint TestData_PK primary key (   objectId,   runId );  create index TestData_NameType on TestData (   objectName,   objectType );  alter table TestData add constraint TestData_CK check (runId > 0);  

Gather the statistics using the following block:

 begin    dbms_stats.gather_table_stats    (       ownname    => user,       tabname    => 'TestData',       method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',       degree     => 4,       cascade    => true    ); end; / 

Now, create two tables, TestData_Logging and TestData2_Nologging, with identical structures to TestData but with no indexes or constraints:

 create table TestData_Logging as select *    from TestData  where 1 = 0;   create table TestData_Nologging nologging as select *    from TestData  where 1 = 0; 

TestData_Logging is in LOGGING mode, while TestData_Nologging is in NOLOGGING mode:

 select table_name,        logging   from user_tables  where table_name like 'TESTDATA_%' escape '';  TABLE_NAME                     LOGGING ------------------------------ ------- TESTDATA_LOGGING               YES TESTDATA_NOLOGGING             NO 

Take a snapshot of the redo size before doing a regular INSERT:

 select a.name,        b.value   from v$statname a,        v$mystat b  where a.statistic# = b.statistic#    and a.name = 'redo size';  NAME              VALUE ----------------- ----------- redo size            39867216 

Perform the INSERT into the TestData_Logging table (Scenario 1):

 set timing on  insert into TestData_Logging select *   from TestData;  547875 rows created.  Elapsed: 00:00:08.28 

Take a snapshot of redo and undo after the INSERT:

 select a.name,        b.value   from v$statname a,        v$mystat b  where a.statistic# = b.statistic#    and a.name = 'redo size';  NAME                    VALUE ----------------- ----------- redo size            93019404   select used_ublk    from v$transaction;  USED_UBLK ----------        238  commit; 

Populating the TestData_Logging table with 547,875 rows in LOGGING mode took 8.28 seconds and forced the database to generate 50.69MB (93,019,404 – 39,867,216 = 53,152,188) of redo and 238 undo blocks.

Page 2 of 4
Now, repeat the same steps for the table in NOLOGGING mode (Scenario 2):
Obtain performance results before load:
 select a.name,        b.value   from v$statname a,        v$mystat b  where a.statistic# = b.statistic#    and a.name = 'redo size';  NAME                    VALUE ----------------- ----------- redo size            93019496  insert /*+ APPEND */ into TestData_Nologging select *   from TestData;   547875 rows created.  Elapsed: 00:00:05.39 

  • Obtain performance results after load:
     select a.name,        b.value   from v$statname a,        v$mystat b  where a.statistic# = b.statistic#    and a.name = 'redo size';  NAME                    VALUE ----------------- ----------- redo size            93019404  select used_ublk    from v$transaction;  USED_UBLK ----------        1 
      1. Load the data into a table with no indexes.
      2. Create indexes in NOLOGGING PARALLEL mode.
      3. Create constraints.
    1. Populating the TestData_Nologging table with 547,875 rows in NOLOGGING mode took 5.39 seconds and generated only 0.01MB (!) (93,120,212 – 93,019,496= 100,716) of redo and 1 undo block.

      Table 1 shows the performance numbers for both scenarios.

      Elapsed Time (sec.) Redo Size (MB) Undo Blocks
      Scenario 1: Regular INSERT, LOGGING mode 8.28 50.69 238
      Scenario 2: Direct-path INSERT, NOLOGGING mode 5.39 0.01 1
      Table 1. Performance Results for Scenarios 1 and 2

      However, if you add just a primary key constraint to the TestData_Nologging table (Scenario 3), the redo size skyrockets even though both the table and the primary key index are in NOLOGGING mode:

       drop table TestData_Nologging;  create table TestData_Nologging nologging as select *    from TestData  where 1 = 0;   alter table TestData_Nologging add constraint TestData_Nologging_PK primary key (   objectId,   runId );  alter index TestData_Nologging_PK nologging; 

      Obtain performance results before load:

       select a.name,        b.value   from v$statname a,        v$mystat b  where a.statistic# = b.statistic#    and a.name = 'redo size';  NAME                    VALUE ----------------- ----------- redo size            93198748  insert /*+ APPEND */ into TestData_Nologging select *   from TestData;   547875 rows created.  Elapsed: 00:00:15.61 

      Obtain performance results after load:

       select a.name,        b.value   from v$statname a,        v$mystat b  where a.statistic# = b.statistic#    and a.name = 'redo size';  NAME                    VALUE ----------------- ----------- redo size            131033428  select used_ublk    from v$transaction;  USED_UBLK ----------        817  commit; 

      As you can see, adding index led to an elapsed time increase from 5.39 to 15.61 seconds, a redo size increase from 0.01MB to 36MB (131,033,428 – 93,198,748 = 37,834,680), and an undo blocks increase from 1 to 817 (!). The reason is the database did heavy logging for the primary key index. Putting a primary key index in NOLOGGING mode did not help because NOLOGGING applies only to a limited number of operations. Here is a list of such operations from Oracle9i SQL Reference Release 2 (9.2):

      DML:
      
      • Direct-path INSERT (serial or parallel)
      • Direct Loader (SQL*Loader)
      DDL:
      • CREATE TABLE ... AS SELECT
      • ALTER TABLE ... MOVE
      • ALTER TABLE ... [all partition operations that involve data movement]
        • ALTER TABLE ... ADD PARTITION (hash partition only)
        • ALTER TABLE ... MERGE PARTITIONS
        • ALTER TABLE ... SPLIT PARTITION
        • ALTER TABLE ... MOVE PARTITION
        • ALTER TABLE ... MODIFY PARTITION ... ADD SUBPARTITION
        • ALTER TABLE ... MODIFY PARTITION ... COALESCE SUBPARTITION
        • ALTER TABLE ... MODIFY PARTITION ... REBUILD UNUSABLE INDEXES
      • CREATE INDEX
      • ALTER INDEX ... REBUILD
      • ALTER INDEX ... REBUILD PARTITION
      • ALTER INDEX ... SPLIT PARTITION

      See Oracle9i SQL Reference Release 2 (9.2) at for more details on NOLOGGING clause (login required).

      The above list shows that you can take advantage of the CREATE INDEX operation by rearranging your steps:

      The following series of commands for the TestData_Nologging table implements the above technique (Scenario 4):

       drop table TestData_Nologging;  create table TestData_Nologging nologging as select *    from TestData  where 1 = 0;   insert /*+ APPEND */ into TestData_Nologging select *   from TestData;    create unique index TestData_Nologging_PK on TestData_Nologging (   objectId,   runId ) nologging parallel;  alter table TestData_Nologging add constraint TestData_Nologging_PK primary key (   objectId,   runId ); 

      Scenario 4 turns out to be more efficient than Scenario 3: redo size went down from 36.08MB to 0.17 and undo blocks from 817 to just 2.

      Table 2 shows the performance numbers for both scenarios 3 and 4.

      Elapsed Time (sec.) Redo Size (MB) Undo Blocks
      Scenario 3: Direct-path INSERT into table with existing primary key 15.61 36.08 817
      Scenario 4: Direct-path INSERT into table with no primary key, creating primary index in NOLOGGING PARALLEL mode, then adding primary key 12.34 0.17 2
      Table 2. Performance Results for Scenarios 3 and 4

      The numbers clearly show that Scenario 4 is the way to go. Now it's time to talk about implementing DELETEs as bulk INSERTs.

Page 3 of 4
Performing Massive DELETEs as Bulk INSERT
The following steps implement the technique of implementing DELETEs as direct-path INSERTs:
  1. Create a table (TestData_X) with the identical structure as the TestData table in the previous section but with no primary key, alternate key(s), or foreign key(s) constraints (NOLOGGING option).
  2. Insert data that you need to be keep into the newly created table using direct-path INSERT.
  3. Create indexes on the new table with NOLOGGING PARALLEL options.
  4. Create constraints on the new table. Use the ENABLE NOVALIDATE option for referential integrity and check constraints.
  5. Swap the TestData and TestData_X tables: rename TestData to TestData_T, rename TestData_X to TestData.
  6. Restore the original logging settings.
  7. Verify the results and drop the TestData_T table.

Optimize the Massive DELETE Operation in Oracle, Part 2 will present the SwapOps package, which contains all of the APIs necessary to implement the above steps. It also will present performance tests for DELETEs and direct-load INSERT, and compare the two.

Page 4 of 4

 » 如果喜欢可以: 点此订阅本站
分类: 数据库 标签:
  1. 本文目前尚无任何评论.
  1. 本文目前尚无任何 trackbacks 和 pingbacks.