Saturday, August 8, 2009

A Simple Way to Create Crosstab Queries in SQL Server - Part 1

This week's programming exercise is using SQL to generate crosstab-style datasets from a SQL Server database. Using the Northwind database as our sample, we'll say that we want to derive crosstab reports to show aggregates of orders by customer.

A clever use of the SQL Server CASE statement is helpful here. The CASE statement in a query allows a column to be computed based on multiple possible conditions. It is like a switch statement in C, C++, C#, Java, or a Select Case statement in VB. Each "case" or condition is laid out, with a corresponding value for the computed column when the case is matched. The first case matched wins if multiple cases can match, and an ELSE default may be supplied to provide a value when no case is matched.

When wrapped in an aggregate function, like sum() or avg(), a CASE statement becomes a convenient means to create crosstab-like output in a simple query. Looking at Northwind then, our first aggregate query computes the total freight charges for each customer, with the shipping companies broken out as columns.

select CustomerID as [Total Freight by Customer]
, sum(CASE WHEN ShipVia = 1 then Freight ELSE 0 END) as [Speedy Express]
, sum(CASE WHEN ShipVia = 2 then Freight ELSE 0 END) as [United Package]
, sum(CASE WHEN ShipVia = 3 then Freight ELSE 0 END) as [Federal Shipping]
from Orders
group by CustomerID
order by CustomerID


Each CASE statement in this example specifies a single condition and a default ELSE. Before applying the sum() function, the CASE statements compute three columns of detail fields to the normal Orders table. Two of the three computed fields will be zero; the column which matches the ShipVia condition will contain the value of the shipping Freight. When the sum() function is then applied, the aggregations are computed in the correct columns.

A simple change to the ELSE defaults from zero to null allows the query to be adjusted to compute averages:

select CustomerID as [Average Freight by Customer]
, avg(CASE WHEN ShipVia = 1 then Freight ELSE null END) as [Speedy Express]
, avg(CASE WHEN ShipVia = 2 then Freight ELSE null END) as [United Package]
, avg(CASE WHEN ShipVia = 3 then Freight ELSE null END) as [Federal Shipping]
from Orders
group by CustomerID
order by CustomerID


To compute counts, one might intuitively think of the count() aggregation function, but I find it simpler to follow this same pattern and use sum(). The CASE statements are modified so that when the appropriate condition is matched the computed value is simply a static value of one (rather than the Freight field in the examples). The default ELSE computes a zero. When the sum() aggregation is applied across all the matching ones, the result is a successful counting of the appropriate records for the shipper:

select CustomerID as [Orders Shipped by Customer]
, sum(CASE WHEN ShipVia = 1 then 1 ELSE 0 END) as [Speedy Express]
, sum(CASE WHEN ShipVia = 2 then 1 ELSE 0 END) as [United Package]
, sum(CASE WHEN ShipVia = 3 then 1 ELSE 0 END) as [Federal Shipping]
from Orders
group by CustomerID
order by CustomerID


Wrapping CASE statements within aggregate functions is a clean and surprisingly readable way of creating crosstabular output. Tune in tomorrow for part 2 where we will use dynamic SQL to generalize this query, allowing the values in the Shippers table to drive the CASEs rather than hardcoding them as we've done here.

0 comments:

Post a Comment

Submit a comment?