jinal's blog

2:30 AM Edit This 0 Comments »

The real reason SELECT * queries are bad: index coverage
Are SELECT * queries bad? Sure, everyone know that. But, why?
It's returning too much data, right?
That's the common answer, but I don't think it's the right one. If you're working with a reasonably normalized database, the actual network traffic difference is pretty small.
Let's take a look at a sample. The following two queries select 326 rows from the TransactionHistoryArchive table in the AdventureWorks database (which has a total of 89K rows). The first uses a SELECT * query, the second selects a specific column:

SELECT * FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100

In this case, the difference in network traffic is only 15K, roughly a 10% difference (180K vs. 165K). It's worth fixing, but not a huge difference.
SELECT * makes the Table / Index Scan Monster come
Often, the bigger problem with SELECT * is the effect it will have on the execution plan. While SQL Server primarily uses indexes to look up your data, if the index contains all the columns you’re requesting it doesn’t even need to look in the table. That concept is known as index coverage. In the above example, the first query results in a Clustered Index Scan, whereas the second query uses a much more efficient Index Seek. In this case, the Index seek is one hundred times more efficient than the Clustered Index Scan.

Unless you've indexed every single column in a table (which is almost never a good idea), a SELECT * query can't take advantage of index coverage, and you're likely to get (extremely inefficient) scan operations.
If you just query the rows you'll actually be using, it's more likely they'll be covered by indexes. And I think that's the biggest performance advantage of ignoring SELECT * queries.
The Stability Aspect
SELECT * queries are also bad from an application maintenance point of view as well, since it introduces another outside variable to your code. If a column is added to a table, the results returned to your application will change in structure. Well programmed applications should be referring to columns by name and shouldn't be affected, but well programmed applications should also minimize the ways in which they are vulnerable to external changes.
Shameless Plug: I go into this (and a lot other important performance tips) in more detail in a soon-to-be-released book for SitePoint.
Published Wednesday, July 18, 2007 11:56 PM by Jon Galloway
Filed under: SQL, SQL Server
Comments
# re: The real reason SELECT * queries are bad: index coverage
"If a row is added to a table..."
Should be: "If a column is added to a table..."
Thursday, July 19, 2007 4:55 AM by Goran
# re: The real reason SELECT * queries are bad: index coverage
Great post Jon...
The point about stability and ordinal position is a very real one, and I strongly agree with the practice of referencing columns by name*. You'll find this out the hard way if you use most database sync applications to migrate changes from one environment to another.
Even if you don't rely on ordinal position, it's a good idea to have your change scripts drop and recreate a table when columns are added to it, if only for schema consistency reasons.
Can't wait for the book!
* ...he says hypocritically, knowing full well that SubSonic relies on the assumption that the name field is in the second ordinal position
Thursday, July 19, 2007 7:14 AM by Eric Kemp
# The real reason SELECT * queries are bad: index coverage
You've been kicked (a good thing) - Trackback from DotNetKicks.com
Thursday, July 19, 2007 9:53 AM by DotNetKicks.com
# re: The real reason SELECT * queries are bad: index coverage
Nice to read the reason why this is pushed so heavily by DBAs.
I will disagree with your statement that SELECT * makes maintenance of an application more difficult. I use a hybrid OR/M (homegrown) that maps many related tables together and returns all columns so that the developer can decide what to do with the data. If someone is to add a column now, all I have to do is update the code to fill the extra column.
If I were selecting specific columns, i would still have to update the code, but additional any views and procs that return data to the app.
Thursday, July 19, 2007 11:08 AM by Jerry
# re: The real reason SELECT * queries are bad: index coverage
Another reason to avoid SELECT * is with views that access other views. You'll have to recompile each view in the correct dependency order or you'll get errors. You also run the risk of introducing problems where the new column matches the name of another column in the view and you can get ambiguous results.
Thursday, July 19, 2007 11:38 AM by Chris Miller
# SELECT * 的真相: 索引覆盖(index coverage)
SELECT *的效率很糟糕吗？当然，所有人都知道这一点，但是为什么呢？
Thursday, July 19, 2007 12:25 PM by Goodspeed
# re: The real reason SELECT * queries are bad: index coverage
good one. A good reason to not use Select *
Thursday, July 19, 2007 1:05 PM by Vikram
# 为什么使用Select * 查询不好
今天在asp.net博客上看到一篇好文，探究使用select *在效率和可维护性上的问题，地址如下：weblogs.asp.net/.../the-real-reason-select-queries-are-bad-index-coverage.aspx 大致意思如下：对于大多数人来说，使用select *的带来的问题直观上是返回了太多的数据，但是经过试验证实这只是小问题。使用select *真正的麻烦来自于索引。由于数据库实际是用索引来查询数据
Thursday, July 19, 2007 1:17 PM by ikeepSmile
# re: The real reason SELECT * queries are bad: index coverage
Good one. Definitely will keep this in mind.
Question, does it also apply with
select foo
from bar
where exists
( select * from tarfu where tarfu.idee = bar.idee)
or does the query optimizer recognize that it doesn't require any complex index scanning?
Thursday, July 19, 2007 2:29 PM by mcgurk
# re: The real reason SELECT * queries are bad: index coverage
@mcgurk:
SQL Server's Query Optimizer knows the difference in EXISTS -case as it tests only a boolean value, anyway.
So it doesn't matter within EXISTS whether you use SELECT * or something else.
Thursday, July 19, 2007 3:21 PM by Jemm
# re: The real reason SELECT * queries are bad: index coverage
I'm curious what other variations you tried. Did you reverse the queries in your batch? Did you add more than one column to your other select? What happens when you explicitly name all of the columns in the table? The reason I ask is that databases are actually really tough to benchmark and profile. It can be nearly impossible when multiple clients are connected as well. Your first query undoubtedly placed all of the pages in RAM that the second query needed so the fact that it ran so much faster doesn't mean much.
Thursday, July 19, 2007 7:34 PM by Sam Corder
# re: The real reason SELECT * queries are bad: index coverage
@Sam Corder
Great points.
1) Yes, Index Coverage will only prevent a scan if all the columns are covered. Adding all columns to a select query wouldn't help. However, if SELECT * you're pretty much guaranteed you won't take advantage of index coverage; if you SELECT only the columns you need you (or your DBA) has the opportunity to add selective indexes to frequently used queries.
You also have the opportunity of adding frequently used columns to indecies via SQL Server 2005's "index with included columns" feature: msdn2.microsoft.com/.../ms190806.aspx
2) I did test the other order, and I cleared cache between all runs. Try this query, you should see that the SELECT * queries show 50% load and the SELECT column query takes 0% load, regardless of order:
USE AdventureWorks
GO
DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECT * FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
GO
DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
GO
DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECT * FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
GO
DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
GO
Thursday, July 19, 2007 8:17 PM by Jon Galloway
# SELECT * 的真相: 索引覆盖(index coverage) 。
SELECT * 的真相: 索引覆盖(index coverage) 。
Friday, July 20, 2007 12:59 AM by 勤勤同学
# re: The real reason SELECT * queries are bad: index coverage
Great article Jon,
Does this still apply even when you want to get all columns from the row? Should you still write out each column name in the select query instead of *?
thanks
Friday, July 20, 2007 8:28 AM by Justin
# re: The real reason SELECT * queries are bad: index coverage
Indexes, and it is also very simple - you quiery ONLY what you need.
There's a reason, if you check top 5 internet dating sites, most of them are running on tens of servers (and one on at least a 100), and one - on 3. If you don't get big, it is easy to cover up sloppiness by throwing hardware at it, but that will also cost ya.
Reason #2 is also obvious - if you need just one value, get a value, not an entire row:
Select Value1 from Table1 Where Key=123
is way better than
Select * from Table1 Where Key=123
, and use appropriate tools to get it to an application (NOT a Recordset). Writing "Select * " justifies sloppy code in this case as well.
Friday, July 20, 2007 5:15 PM by SmiLie
# F??bio Pedrosa » Why SELECT * Queries are bad
Pingback from F??bio Pedrosa » Why SELECT * Queries are bad
Friday, July 20, 2007 6:51 PM by F??bio Pedrosa » Why SELECT * Queries are bad
# re: The real reason SELECT * queries are bad: index coverage
its really fantastic..thank you verymuch.
Thursday, July 26, 2007 10:42 AM by kusuma
# re: The real reason SELECT * queries are bad: index coverage
Does this is also true when relating to count(*)?
I mean, if I do something like:
SELECT count(*) FROM Table

jinal's blog

About Me

Categories

0 comments: