Wednesday, January 13, 2010

Common Table Expressions, Subqueries and Views

I am sure you are aware of different types of joins if you have used Sql server or any Database. Joins can be used on tables to get cumulated record set. If we have a requirement to get a sum or average or any type aggregate function, we use group by with combined record set. If you have come across a situation where you need to join the table but you have to get some additional records which might require some other table join or aggregation, the normal approach would be creating a function and return as table or create a view. If you are not clear what am I trying to say, I am going to take an example.

Lets consider a Employee and EmployeeLeaves tables

CREATE TABLE [dbo].[Employee](
[EmpID] [int] NOT NULL,
[EmpName] [varchar](50) NULL,
[Address] [varchar](50) NULL,
[Status] [char](1) NULL,

CREATE TABLE [dbo].[EmpLeaves](
[EmpID] [int] NULL,
[EmpLeave] [int] NULL,
[EmpLeaveType] [varchar](5) NULL,
[Date] [date] NULL

Now lets use a straight forward query and get all employees and their leave details. The below query does the same thing.

[Query 1]
select * from Employee, EmpLeaves where Employee.EmpID = EmpLeaves.EmpID

Lets say I need to get number of total leaves applied by each employee, I would simply write as

[Query 2]
select Employee.EmpID,EmpName,EmpLeaveType,SUM(Empleave) as NumberOfleave from Employee, EmpLeaves where Employee.EmpID = EmpLeaves.EmpID
group by Employee.EmpID,EmpName,EmpLeaveType
order by 1 ;

Ok. Now if I need sum with the above record set (Query 1), how do I get? One option to use subquery

[Query 3]
select *,(select SUM(Empleave) from EmpLeaves where EmpID = empl1.EmpID and EmpLeaveType = empl1.EmpLeaveType) as NumberOfleaves
from Employee, EmpLeaves empl1 where Employee.EmpID = empl1.EmpID
order by 1;

So till now we are all good and this is the simplest way we could achieve this. Now lets say I need one more column in my result set say Number of times the leave was applied. That is nothing but number of rows for each employee based on leave type.

[Query 4]
select *,(select SUM(Empleave) from EmpLeaves where EmpID = empl1.EmpID and EmpLeaveType = empl1.EmpLeaveType) as NumberOfleaves,
(select COUNT(Empleave) from EmpLeaves where EmpID = empl1.EmpID and EmpLeaveType = empl1.EmpLeaveType) as NumberOfTimes
from Employee, EmpLeaves empl1 where Employee.EmpID = empl1.EmpID
order by 1;

but here we are using a subquery which is using the same table but used twice. Can we fine tune this anyway? If you have used views, you could create a view and use that in your query

CREATE VIEW [dbo].[vWGetNumLeave]
SELECT COUNT(EmpLeaveType) AS NumberofTimes, SUM(EmpLeave) AS NumberofLeaves, EmpLeaveType, EmpID
FROM dbo.EmpLeaves
GROUP BY EmpID, EmpLeaveType


Now lets use this view in our query

[Query 5]
select Employee.*,EmpLeaves.*, empcount.NumberofLeaves ,empcount.NumberofTimes
from Employee, EmpLeaves, vwGetNumLeave empcount
where Employee.EmpID = EmpLeaves.EmpID and empcount.EmpId = Employee.EmpID and empcount.EmpLeaveType = EmpLeaves.EmpLeaveType
order by 1 ;

So now we get both NumberOfleaves and NumberOfTimes in our query without using any aggtreation in our query. All the aggregation part of logic has been moved to view. If we need any additional logic or any new column that needs to be added, we can add it on view or function and plug that in our query.

But now what if we are using this only in one place? Is it worth creating view? Consider you have several incidents where you need such aggreations but then you keep adding views? So here is the use of common table expressions newly introduced with sql 2005. I am sure if you understand the use of it, you will start using it in many places.

[Query 6]
with EmpCount AS (Select COUNT(EmpleaveType) as NumberofTimes, SUM(Empleave) as NumberofLeaves, Empleavetype, Empid from
EmpLeaves group by EmpID, EmpLeaveType)
select Employee.*, EmpLeaves.*,EmpCount.NumberofTimes,EmpCount.NumberofLeaves from Employee
inner join EmpLeaves on Employee.EmpID = EmpLeaves.EmpID
left outer join EmpCount on EmpCount.EmpID = Employee.EmpID and EmpCount.EmpLeaveType = EmpLeaves.EmpLeaveType
order by 1;

Common Table expression (CTE) can be thought of as a temporary result set within the execution scope. So now if we need to get any specific logic which is not referred anywhere, we have a option to create as CTE.

The basic syntax for CTE

WITH expression_name [ ( column_name [,...n] ) ]
( CTE_query_definition )

The statement to run the CTE is:
FROM expression_name;

This can also be used as recursive which I will try to explain sometime later. I have taken this example to explain how CTE is different than normal joins and subqueries. This example is only to give a basic idea, if you come across any better examples or implementations you can share with me :-)

No comments:

Post a Comment