LINQ (Language Integrated Query) to SQL is a great tool because it allows developers to concentrate on business problems instead of worrying about writing SQL. Unfortunately, generated code typically comes with a catch. My general rule of thumb is that "I never trust any tools that have a wizard or generate code". After all, every time you release code into production you are putting your reputation on the line. Therefore, don't you think it is important to know what your code is really doing under the covers?

Since I started getting involved with Database Administration about 3 years ago I have become extremely conscious of the SQL that my code generates. After all, most database performance problems stem from the fact that developers test on empty databases and everything seems to work fine until millions of records trickle into the system. Then ugly problems like missing indexes, functions in the where clause and poorly written queries bubble to the surface. The occurrence of these problems have seen to escalate even more since the release of LINQ. LINQ can create unique problems due the fact that it leverages your relationships between tables. For example, In the AdventureWorks database we have the following relationship between the Employee and EmployeePayHistory tables:

For sake of demonstration, we build an application that prints the pay rate history for an employee to a textbox. To start the process, the developer writes a LINQ query that looks like this:

var query = from emp in db.Employees
            where
               emp.EmployeeID == 4
            select
               emp;

Then with the results of this query a nested loop is used to print the pay rates to a textbox:

foreach( Employee employee in query ) {
    foreach (EmployeePayHistory payHistory in employee.EmployeePayHistories ) {
        textBox1.AppendText(payHistory.Rate + Environment.NewLine);
    }
}

How many SQL queries do you think this code generated? From reviewing the SQL trace I can tell you that 2 queries were generated. This is due to the fact that we are leveraging the database relationship between the Employee and EmployeePayHistory tables to get the employee pay rate(s) from within the inner loop. Although this is convenient for the developer we are now issuing 2 queries instead of 1:

exec sp_executesql N'SELECT [t0].[EmployeeID], [t0].[NationalIDNumber], [t0].[ContactID], [t0].[LoginID], [t0].[ManagerID], 
[t0].[Title], [t0].[BirthDate], [t0].[MaritalStatus], [t0].[Gender], [t0].[HireDate], [t0].[SalariedFlag], [t0].[VacationHours], 
[t0].[SickLeaveHours], [t0].[CurrentFlag], [t0].[rowguid], [t0].[ModifiedDate]
FROM [HumanResources].[Employee] AS [t0]
WHERE [t0].[EmployeeID] = @p0',N'@p0  int',@p0=4

exec sp_executesql N'SELECT [t0].[EmployeeID], [t0].[RateChangeDate], [t0].[Rate], [t0].[PayFrequency], [t0].[ModifiedDate]
FROM [HumanResources].[EmployeePayHistory] AS [t0]
WHERE [t0].[EmployeeID] = @p0',N'@p0  int',@p0=4

Now maybe at this point you are happy with your code and you deploy it to production. Who cares if I have 2 queries being executed instead of 1! I have a SQL Server instance with 16GB of RAM. It can handle it. However, a week later your boss stops over to your desk and asks you to modify the program so it can print the pay rate for 50 employees instead of 1. You now modify your linq query to look like this:

var query = from emp in db.Employees
            where
               emp.EmployeeID < 51
            select
               emp;

This LINQ statement now produces 100 queries instead of 50. That's double the workload! 10 minutes after deploying the change to production your DBA calls you and says that your application is killing his SQL server! Huh...How could this happen? This is because the query generated by LINQ are run against a large table and each individual query causes a full table scan of a 5 million row table! OK, now this is a serious problem! So how can we reduce those 100 queries down to a more reasonable number. This can be accomplished by doing a join in the LINQ query and returning the data in a single result:

var query = from emp in db.Employees
            join ph1 in db.EmployeePayHistories on emp.EmployeeID
               equals ph1.EmployeeID into ph2
            from payHistory in ph2.DefaultIfEmpty()
            select new
            {
               EmployeeID = emp.EmployeeID,
               Rate = payHistory.Rate
            };

So how many queries do you think this LINQ query produced? The answer is 1! The reason is due to the fact that LINQ uses delayed execution. In other words it will not execute any SQL until you need to iterate a result set or access a property. Therefore by using a nested loop with the results of the original LINQ query we caused LINQ to make a bunch of individual queries to get the pay rate history for each employee. By joining the results in a single record set we were able to get all the data upfront and only issue a single query.

I can't tell you how many times I have seen problems like this caused by developers using LINQ who really do not understand how it works under the covers. The bottom line is that you need to be aware of any tool that generates code for you. Take the time to figure out how it really works and for gods sake do a SQL trace before deploying your code!