Digital Colony!

Outer Joins Against A Date Range

This article is written for SQL Server 2000, which supports user-defined functions and OUTER JOINS. All examples come from the Northwind database which is part of the default installation of SQL Server.

In SQL an OUTER JOIN is where you want to return all the elements in a table regardless of whether they exist in the second table. The most common case for using an OUTER JOIN is when you need a COUNT or SUM value even when the aggregate function returns a zero. Before we get to the heart of this article, let me use two queries to demonstrate the power of a basic OUTER JOIN.

Customers and Order COUNT

There are 2 ways to tackle this request. The first is with an INNER JOIN. It will only return Customers that have at least one order. The second way is with an OUTER JOIN. It will return all Customers, even those with no Orders.
-- A: INNER JOIN (returns 89 customers, all of which have at least 1 order) 
SELECT C.CompanyName, COUNT(R.OrderID) AS OrderCount 
FROM Customers C 
INNER JOIN Orders R ON C.customerID = R.customerID 
GROUP BY C.CompanyName 
ORDER BY COUNT(R.OrderID) DESC 

-- B: OUTER JOIN (returns all 91 customers, 2 of which have 0 orders) 
SELECT C.CompanyName, COUNT(R.OrderID) AS OrderCount 
FROM Customers C 
LEFT OUTER JOIN Orders R ON C.customerID = R.customerID 
GROUP BY C.CompanyName 
ORDER BY COUNT(R.OrderID) DESC 

Orders by Day of the Week for July 1996

The first OUTER JOIN was straight-forward. But what if the entity you need to perform an OUTER JOIN upon doesn't exist in the database? One such entity that comes to mind is date. Let's suppose the sales manager is trying to determine which day she is going to paint the office based upon the day of the week with historically the least amount of orders. The least amount could be zero, so we need to perform an OUTER JOIN. But how can you perform an OUTER JOIN against a range of dates? Three ways come to mind.

1. Create a TABLE of dates (or digits) using a SQL statement. This could be a temporary or permanent table.
2. Get verbose with your SQL and write long CASE statements.
3. Use the 2 user-defined functions I'm providing to you in this article.

The problem with #1 is now you're executing multiple statements to handle one query. You are creating a TABLE, perform the INSERT and ultimately dropping the TABLE. At this point you'll consider writing a stored procedure to accomplish the task. All for a single query. Too much work. The problem with #2 is your SQL can get very long and hard to read and the data is returned more like an Access PIVOT TABLE. The slick way to handle this task is using a user-defined function.
-- C: Using multiple CASE and SUM statement (long, ugly, and tabular) 
SELECT SUM(CASE DATEPART(dw,OrderDate) WHEN 0 THEN 1 ELSE 0 END) AS MonSales, 
SUM(CASE DATEPART(dw,OrderDate) WHEN 1 THEN 1 ELSE 0 END) AS TueSales, 
SUM(CASE DATEPART(dw,OrderDate) WHEN 2 THEN 1 ELSE 0 END) AS WedSales, 
SUM(CASE DATEPART(dw,OrderDate) WHEN 3 THEN 1 ELSE 0 END) AS ThuSales, 
SUM(CASE DATEPART(dw,OrderDate) WHEN 4 THEN 1 ELSE 0 END) AS FriSales, 
SUM(CASE DATEPART(dw,OrderDate) WHEN 5 THEN 1 ELSE 0 END) AS SatSales, 
SUM(CASE DATEPART(dw,OrderDate) WHEN 6 THEN 1 ELSE 0 END) AS SunSales 
FROM Orders 
WHERE OrderDate BETWEEN '7/1/1996' AND '8/1/1996' 

-- D: Performing an OUTER JOIN against a user-defined function (dbo.udfNumbers) 
SELECT DATENAME(dw,N.number) AS DayName, COUNT(R.orderID) AS OrderCount 
FROM dbo.udfNumbers(0,6) N 
LEFT OUTER JOIN Orders R ON N.number = DATEPART(dw,R.OrderDate) 
AND R.OrderDate BETWEEN '7/1/1996' AND '8/1/1996' 
GROUP BY DATENAME(dw,N.number), N.Number 
ORDER BY N.number

Using dbo.udfNumbers

The dbo.udfNumbers user-defined function returns a TABLE of numbers. It is available for download at the end of this article. The upper range limit is 999,999. If you need larger numbers, modifying the function should be fairly easy.
-- E: Usage dbo.udfNumbers(low range,high range) 
-- will return 6,7,8,9,10,11,12
SELECT number FROM dbo.udfNumbers(6,12)

Orders by Date for July 1996

I use dbo.udfNumbers for those quick queries where I just need a range of digits. For most date-based reporting I use the far more powerful dbo.udfDateTimes. This user-defined function uses the digit logic in dbo.udfNumbers along with SQL Server's date and time function to return ranges of datetime. Query C was based upon Day of the Week, so there were only 7 possible values. What if the sales manager asked for a day by day breakdown of orders for July? That would turn into one long and ugly SQL statement. The dbo.udfDateTimes function creates the date range need for that OUTER JOIN without the long and ugly SQL.
-- F: dbo.udfDateTimes day interval for July 1996 
SELECT dtime AS OrderDate, COUNT(R.orderID) AS OrderCount 
FROM dbo.udfDateTimes ('7/1/1996', '7/31/1996',1,'day') DT 
LEFT OUTER JOIN Orders R ON DT.dtime = R.OrderDate 
GROUP BY dtime

Using dbo.udfDateTimes

Like dbo.udfNumbers, the user-defined function dbo.udfDateTimes returns a TABLE. The first parameter is start date. The second is end date. The third parameter is interval. In most cases this will be set to 1, but if you had a need to see one of the dateparts at an alternate interval, you would modify this parameter. The last parameter is datepart. Those supported are year, quarter, month, week, day, hour and minute. The upper range is 9,999,999, which (like udfNumbers) can easily be modified should need a function that returns more datetimes.

Download

dbo.udfNumbers - function that returns a table of digits.
dbo.udfDateTimes - function that returns a table of datetimes.

Last Words

One last piece of advice is to get your datetime range correct first before attempting to OUTER JOIN with it. Only once you know the dbo.udfDateTimes is returning the datetimes you need for the query should you proceed with your OUTER JOIN. Although the Northwind database deals with orders, another great use for these functions is when looking for missing data. In those cases where you need to seek out the datetimes where no data is present, add a HAVING clause set to zero.
-- G: Days in July 1996 with no orders 
SELECT dtime AS OrderDate, COUNT(R.orderID) AS OrderCount 
FROM dbo.udfDateTimes ('7/1/1996', '7/31/1996',1,'day') DT 
LEFT OUTER JOIN Orders R ON DT.dtime = R.OrderDate 
GROUP BY dtime 
HAVING COUNT(R.orderID) = 0

Labels: , ,

AddThis Social Bookmark Button

12 Comments:

Anonymous said...

Very clever. Will use this code in a future project.

2/10/2007 5:50 PM

Reagan Boone said...

Great post. I never considered using a function to generate a date range. O'Reilly's Transact SQL Cookbook uses a pre-defined pivot table in their examples, but this makes the SQL so much more eloquent.

2/16/2007 12:37 AM

Mark said...

I'm a little confused by the results of the udfDateTimes function. By adding the timestamp in the function unless all datestamps in a table are 00:00:00.000 you will not return any results.

It could be that I'm not passing the variables correctly, but when I run this function against a table with hundreds of create dates for invoices I get 0 returns.

I used the function in this query:
SELECT dtime AS OrderDate, COUNT(Invoice.invno) AS InvoiceNo
FROM PUBs.dbo.udfDateTimes ('1/1/2007', '1/31/2007',1,'day') DT
LEFT OUTER JOIN Invoice Invoice ON DT.dtime = Invoice.EntDate
GROUP BY dtime
HAVING COUNT(Invoice.invno) = 0


I get these results:
2007-01-01 00:00:00.000 0
2007-01-02 00:00:00.000 0
2007-01-03 00:00:00.000 0
2007-01-04 00:00:00.000 0
2007-01-05 00:00:00.000 0
2007-01-06 00:00:00.000 0
2007-01-07 00:00:00.000 0
2007-01-08 00:00:00.000 0
2007-01-09 00:00:00.000 0
2007-01-10 00:00:00.000 0
2007-01-11 00:00:00.000 0
2007-01-12 00:00:00.000 0
2007-01-13 00:00:00.000 0
2007-01-14 00:00:00.000 0
2007-01-15 00:00:00.000 0
2007-01-16 00:00:00.000 0
2007-01-17 00:00:00.000 0
2007-01-18 00:00:00.000 0
2007-01-19 00:00:00.000 0
2007-01-20 00:00:00.000 0
2007-01-21 00:00:00.000 0
2007-01-22 00:00:00.000 0
2007-01-23 00:00:00.000 0
2007-01-24 00:00:00.000 0
2007-01-25 00:00:00.000 0
2007-01-26 00:00:00.000 0
2007-01-27 00:00:00.000 0
2007-01-28 00:00:00.000 0
2007-01-29 00:00:00.000 0
2007-01-30 00:00:00.000 0
2007-01-31 00:00:00.000 0

2/16/2007 9:00 AM

MAS said...

Mark --
If the invoice table has greater date detail than the JOIN won't make a match. In that cast you'll need to convert the date column of the Invoice table to match the UDF.

This is a rough idea.
DECLARE @DateOne as DATETIME
DECLARE @DateTwo as DATETIME

SET @DateOne = '1/1/2007'
SET @DateTwo = '1/1/2007 1:23:45'

IF @DateOne = @DateTwo
PRINT 'Match'
ELSE
PRINT 'Nope' -- Fires

IF @DateOne = CAST(CAST(@dateTwo AS VARCHAR(13)) AS DATETIME)
PRINT 'Match' -- Fires
ELSE
PRINT 'Nope'


So try casting your Invoice.EndDate to a MONTH/DAY/YEAR without the hours/minute/second.

2/16/2007 9:45 AM

Mark said...

Thanks, MAS.

Except that the your example should have been varchar(11) instead of varchar(13) your solution worked perfectly.

Here is the query I have that counts the # of invoices created on each date of a given month... This is great.

SELECT dt.dtime AS OrderDate, COUNT(invoice.invno) AS InvoiceNo
FROM PUBs.dbo.udfDateTimes ('1/1/2007', '1/31/2007',1,'day') DT
LEFT OUTER JOIN Invoice Invoice ON DT.dtime = CAST(CAST(Invoice.EntDate as varchar(11)) as DATETIME)
GROUP BY dtime
--HAVING COUNT(*) = 0


The result is this:
Date # of Invoices
2007-01-01 00:00:00.000 0
2007-01-02 00:00:00.000 10
2007-01-03 00:00:00.000 15
2007-01-04 00:00:00.000 4
2007-01-05 00:00:00.000 6
2007-01-06 00:00:00.000 0
2007-01-07 00:00:00.000 0
2007-01-08 00:00:00.000 8
2007-01-09 00:00:00.000 0
2007-01-10 00:00:00.000 1
2007-01-11 00:00:00.000 0
2007-01-12 00:00:00.000 4
2007-01-13 00:00:00.000 0
2007-01-14 00:00:00.000 0
2007-01-15 00:00:00.000 9
2007-01-16 00:00:00.000 0
2007-01-17 00:00:00.000 6
2007-01-18 00:00:00.000 9
2007-01-19 00:00:00.000 5
2007-01-20 00:00:00.000 0
2007-01-21 00:00:00.000 0
2007-01-22 00:00:00.000 1
2007-01-23 00:00:00.000 2
2007-01-24 00:00:00.000 4
2007-01-25 00:00:00.000 2
2007-01-26 00:00:00.000 3
2007-01-27 00:00:00.000 0
2007-01-28 00:00:00.000 0
2007-01-29 00:00:00.000 10
2007-01-30 00:00:00.000 0
2007-01-31 00:00:00.000 12

2/16/2007 10:30 AM

MAS said...

Thanks for the correction. I was doing that from memory.

2/16/2007 10:35 AM

Anonymous said...

This just may rock my world; I've been trying to figure out how to do a fairly complex aggregate query for a little bit now. A bit more advice may nudge me over the edge...

I have two tables in this query: terminals, and activities. For simplicity's sake, terminals consists of just an id column, while activities consists of an id column, a terminal_id column, a start_date column, and a checkouts column. (The real model is considerably more complex, but this is the nut of the problem I'm having.)

I want to get a result set with one row for each date (in a given range) and terminal. Simple enough:

SELECT dt.dtime, terminals.id
FROM udfDateTimes('20070201', '20070228', 1, 'day') AS dt, terminals

Now I'd like to add the activities data to this resultset:

SELECT dt.dtime, terminals.id, activities.checkouts
FROM udfDateTimes('20070201', '20070228', 1, 'day') AS dt, terminals
LEFT OUTER JOIN activities ON activities.terminal_id = terminals.id AND activities.start_date = dt.dtime

But I get:

The column prefix 'dt' does not match with a table name or alias name used in the query.

Any notion why?

3/14/2007 12:07 PM

Anonymous said...

And in response to my own query, a LEFT OUTER JOIN on a noop condition seems to do the job:

SELECT dtime, terminals.id, activities.checkouts
FROM terminals
LEFT OUTER JOIN udfDatetimes('20070201', '20070228', 1, 'day') ON 1=1
LEFT OUTER JOIN activities ON activities.terminal_id = terminals.id AND activities.start_date = dtime

3/14/2007 12:21 PM

MAS said...

Not sure I understand the terminals JOIN in your example. Is it supposed to be an OUTER JOIN?

I never use commas to separate tables in the FROM clause. Join each table with an INNER JOIN or OUTER JOIN and an ON.

3/14/2007 12:22 PM

MAS said...

Looks like our comments got crossed. Glad you figured it out.

3/14/2007 12:23 PM

Anonymous said...

Hi
Im trying to do bit of calculation of the date that goes into the function.


This is simplefied version. I'll actually be gettting date from a varible and the -3 from from a joined table.

Anyway this gives error:
Incorrect syntax near '('.

SELECT * from dbo.udfDateTimes(DATEADD(day, -3, '26/may/2007'),'26/may/2007',DEFAULT,DEFAULT ) AS udfDateTimesTable

5/12/2007 10:58 AM

MAS said...

I ran your query and it didn't throw an error for me, however it didn't return any rows.

Change the 4th parameter from DEFAULT to 'DAY' and you will get 4 rows back.

SELECT * from dbo.udfDateTimes(DATEADD(day, -3, '26/may/2007'),'26/may/2007',DEFAULT,'DAY' ) AS udfDateTimesTable

5/12/2007 2:18 PM

 

Post a Comment

 

Digital Colony Copyright © 1999-2008 XHTML   508
This site uses Blogger, which is not 100% XHTML compliant.
Try...Catch Disclaimer: For brevity many examples do not include error handling. That is your responsibility.