Sunday 16 September 2007

Analytic Features - Grouping Sets

Grouping Sets - Instead of a UNION ALL statement (that would require multiple table scans), define a grouping set - the new syntax will result in only a single pass over the base table.
Grouping Sets are specified in the GROUP BY clause
Syntax:

SELECT group_function(column1), column2, group_function(column3)...
FROM table_list
[WHERE conditions]
GROUP BY GROUPING SETS (group_by_list)

SELECT group_function(column1), column2, group_function(column3)...
FROM table_list
[WHERE conditions]
GROUP BY CUBE (group_by_list)

SELECT group_function(column1), column2, group_function(column3)...
FROM table_list
[WHERE conditions]
GROUP BY ROLLUP (group_by_list)

Examples:
Instead of this UNION query...


SELECT
manager_id, null hire_date, count(*)
FROM
employees
GROUP BY manager_id, 2
UNION ALL
SELECT
null, hire_date, count(*)
FROM
employees
GROUP BY 1, hire_date

The above rewritten as a Grouping Set...

SELECT
manager_id, hire_date, count(*)
FROM
employees
GROUP BY GROUPING SETS (manager_id, hire_date);

The GROUPING SET clause allows you to specify the EXACT groups.

CUBE
Where a large number of groupings are needed then the CUBE and ROLLUP statements extend this idea by calculating multiple groupings in a single statement.

e.g. GROUP BY CUBE (hire_date, manager_id, product) will produce 2^3 =8 groupings
1) hire_date, manager_id, product
2) hire_date, manager_id
3) hire_date, product
4) manager_id, product
5) hire_date
6) manager_id
7) product
8) Grand Total

GROUP BY CUBE always calculates ALL the combinations - which may be far more than needed.

ROLLUP
e.g. GROUP BY ROLLUP (hire_date, manager_id, product) will produce 4 groupings
1) hire_date, manager_id, product
2) hire_date, manager_id
3) hire_date,
4) Grand Total

GROUP BY ROLLUP calculates all combinations for the first column listed in the ROLLUP clause.

This can be further tuned by using parenthesis to remove some of the combinations

e.g. GROUP BY ROLLUP (hire_date, (manager_id, product)) will produce
1) hire_date, manager_id, product
2) hire_date
3) Grand Total

Grouping function
CUBE and ROLLUP will generate NULLs for each dimension at the subtotal levels.
The Grouping() function can be used to identify these rows, which can be very useful when performing additional calculations such as Ranking within a group.

The values returned by grouping() are:
0 for NULL data values
1 for NULL indicating a dimension subtotal

The results of Grouping() can be passed into a decode() e.g.
SELECT .. PARTITION BY GROUPING(column1) ..
SELECT .. PARTITION BY DECODE(GROUPING(column1), 1, ‘My SubTotal’, column1)) …

Combining (concatenating) Groupings
The CUBE and ROLLUP clauses can be combined as part of a standard GROUP BY clause
e.g. GROUP BY manager_id, ROLLUP (hire_date, product)

Notes
Grouping sets are typically 80 - 90% more efficient at producing sub-totals than equivalent SQL code.

ROLLUP/CUBE can be used with all aggregate functions (MAX, MIN, AVG, etc.)

A HAVING clause will apply to all the data returned.