In my past life as a software engineer I spent a great deal of time developing applications that interfaced with some sort of database. Quickly I realized that T-SQL was going to be a very important weapon in my arsenal. However, when I wrote complex queries as a developer I usually concentrated on making sure that the data returned was correct. I really did not spend much time thinking about performance and scalability.
A few years passed by and I was offered an opportunity to become a DBA. It wasn't long after I took the job that I starting seeing things in a different light. I was no longer happy with just getting the right data, now my focus shifted towards performance. In my experience, most developers seem to be satisfied with a query as soon as it passes syntax validation and a result is returned. I think developers are so used to cranking out code that they develop a mentality that if something compiles it is valid. One of things that has helped reverse this mindset at my company was to educate the developers on how to write proper SQL. As part of this endeavor I came up with some simple rules to remember when developing a query:
- Don't return more columns or rows of data to the client than absolutely necessary. This just increases disk I/O on the server and network traffic, both of which hurts performance. In SELECT statements, don't use SELECT * to return rows, always specify in your SELECT statement exactly which columns are needed to be returned for a particular query, and not a column more. In most cases, be sure to include a WHERE clause to reduce the number or rows sent to only those rows the clients needs to perform the task immediately at hand.
- Do not use functions in the where clause. Functions in the where clause will prevent SQL Server from using an index. This includes concatenating columns of a join, using ltrim or rtrim functions, cast or convert functions, substring functions and date manipulation functions. It is more efficient to do these types of operations in code.
- If you have two or more tables that are frequently joined together, then the columns used for the joins should have an appropriate index. If the columns used for the joins are not naturally compact, then considering adding surrogate keys to the tables that are compact in order to reduce the size of the keys, thus decreasing read I/O during the join process, and increasing overall performance.
- Try to avoid keywords like distinct and top N. These keywords generally will force a sort which will kill performance.
- Always read the execution plan. Be careful with any items marked as a Table Scan. If you have a table scan then you need to consider adding an index. As a rule of thumb I generally try to eliminate all table scans unless the table has a small number of rows. When reading the plan focus on the high cost percentage items first. In most cases you will find that the fix will be simple such as a missing index. SQL Server 2005 has some powerful Dynamic Management Views to identify missing indexes. Bart Duncan has an example on his blog.
The final bullet introduces the topic of reading the execution plan. I can not stress how important an execution plan is when you are performance tuning a query. In SQL Server you can view an execution plan by hitting ALT+L in a query window.
I realize that in order to properly interpret an execution plan you will need to endure a learning curve. Unfortunately most developers are pressed for time and need a quick fix. Therefore, I put together this short list which ranks the items seen in a typical execution plan from worst to best. The basic idea behind this list is to try to turn a table scan into something better such as an index seek. This could be accomplished by creating an index or modifying your query to properly use the index of maybe even updating your statistics.
- Table Scans (Worst)
- Clustered Index Scan
- Index Scan
- Index Seek (Best)
Although, this article is a high level introduction to query performance tuning I believe it does a good job on highlighting some of the most common performance problems. Most of the time poor performing queries are easily fixed by creating an index or modifying the SQL to take advantage of the indexes that are already there. I have personally experienced many situations where a query went from over 10 minutes down to 10 seconds just by adding a simple index.