Saturday, August 8, 2009

A Simple Way to Create Crosstab Queries in SQL Server - Part 1

This week's programming exercise is using SQL to generate crosstab-style datasets from a SQL Server database. Using the Northwind database as our sample, we'll say that we want to derive crosstab reports to show aggregates of orders by customer.

A clever use of the SQL Server CASE statement is helpful here. The CASE statement in a query allows a column to be computed based on multiple possible conditions. It is like a switch statement in C, C++, C#, Java, or a Select Case statement in VB. Each "case" or condition is laid out, with a corresponding value for the computed column when the case is matched. The first case matched wins if multiple cases can match, and an ELSE default may be supplied to provide a value when no case is matched.

When wrapped in an aggregate function, like sum() or avg(), a CASE statement becomes a convenient means to create crosstab-like output in a simple query. Looking at Northwind then, our first aggregate query computes the total freight charges for each customer, with the shipping companies broken out as columns.

select CustomerID as [Total Freight by Customer]
, sum(CASE WHEN ShipVia = 1 then Freight ELSE 0 END) as [Speedy Express]
, sum(CASE WHEN ShipVia = 2 then Freight ELSE 0 END) as [United Package]
, sum(CASE WHEN ShipVia = 3 then Freight ELSE 0 END) as [Federal Shipping]
from Orders
group by CustomerID
order by CustomerID


Each CASE statement in this example specifies a single condition and a default ELSE. Before applying the sum() function, the CASE statements compute three columns of detail fields to the normal Orders table. Two of the three computed fields will be zero; the column which matches the ShipVia condition will contain the value of the shipping Freight. When the sum() function is then applied, the aggregations are computed in the correct columns.

A simple change to the ELSE defaults from zero to null allows the query to be adjusted to compute averages:

select CustomerID as [Average Freight by Customer]
, avg(CASE WHEN ShipVia = 1 then Freight ELSE null END) as [Speedy Express]
, avg(CASE WHEN ShipVia = 2 then Freight ELSE null END) as [United Package]
, avg(CASE WHEN ShipVia = 3 then Freight ELSE null END) as [Federal Shipping]
from Orders
group by CustomerID
order by CustomerID


To compute counts, one might intuitively think of the count() aggregation function, but I find it simpler to follow this same pattern and use sum(). The CASE statements are modified so that when the appropriate condition is matched the computed value is simply a static value of one (rather than the Freight field in the examples). The default ELSE computes a zero. When the sum() aggregation is applied across all the matching ones, the result is a successful counting of the appropriate records for the shipper:

select CustomerID as [Orders Shipped by Customer]
, sum(CASE WHEN ShipVia = 1 then 1 ELSE 0 END) as [Speedy Express]
, sum(CASE WHEN ShipVia = 2 then 1 ELSE 0 END) as [United Package]
, sum(CASE WHEN ShipVia = 3 then 1 ELSE 0 END) as [Federal Shipping]
from Orders
group by CustomerID
order by CustomerID


Wrapping CASE statements within aggregate functions is a clean and surprisingly readable way of creating crosstabular output. Tune in tomorrow for part 2 where we will use dynamic SQL to generalize this query, allowing the values in the Shippers table to drive the CASEs rather than hardcoding them as we've done here.

No comments:

Post a Comment

Submit a comment?