Tuesday 18 December 2007

SQL & PL/QL : Deleting Table with Millions Rows

Here are the case:
You have to compare 2 tables, both having around million rows, and delete data based on single column.

SQL>DELETE FROM TABLE_A WHERE COLUMN_A NOT IN
(SELECT COLUMN_A FROM TABLE_B)

when you execute the above command, it takes a long time. So what is the best way to achieve it?
Here are some solutions:

1. If the rows percentage to be deleted is large then you can try the following step:
* Create a new table where COLUMN_A NOT IN
(SELECT COLUMN_A FROM TABLE_B)
* TRUNCATE table_A
* Insert in to table_A (select * from new table)
* If you have any constraints then may be it can be disabled.
2. If the query is properly indexed you might get better performance from a correlated subquery instead of NOT IN, something like

DELETE FROM TABLE_A
WHERE NOT EXISTS(
SELECT COLUMN_A
FROM TABLE_B
WHERE b.column_a = a.column_a
)