I have a sqlserver table with the usual
intID(primary key),field1,field2,manyotherfields..., datetime TimeOperation
99% of my different kind of queries start with a TimeOperation BETWEEN startTime AND endTime, and then select * (or count(*)) where fieldA=xxx, and join with other smaller tables.
select * because more or less I need all the fields.
I obviusly created an index on TimeOperation … but performance are not good enough, so I want to add some index key columns or index included columns, but I’m a little bit confused.
I get the difference between the two, but I don’t get how much adding a column in each case impacts on speed and on size.
I guess that the biggest improvement would be to create an index including ALL the columns, is it right? (but I can’t afford it in terms of space)
And if I often use field1=xxx for example, adding field1 to the index key columns (after TimeOperation) would give better performance right?
Also…just to be sure how an index with included columns works: if I select rows with TimeOperation in a certain range, sql seeks my TimeOperation index for the rows I’m interested in, and it is faster than scanning all the table because in the index the TimeOperation values are in ascending order, is it right? But then I need all the data now I need all the rest of the data fields of those rows…how does sql acts to retrieve the data? I guess it has a sort of bookmark to those rows in the index, right? But it has to hit the table multiple times then… so including all the columns in the index will save the time to hit the table, it it correct?
Thanks!
Mattia
We will need more information on your table examples of your queries to address this fully, but:
TimeOperationas the first column should address the bulk of queries againstTimeOperation.TimeOperation, you might consider building your clustered index around it.field1 = xthen you need a separate index just forfield1(assuming that it is suitably selective), i.e. noTimeOperationon the index if its not in the WHERE clause of your query.selectstatement, the lookup can be avoided. But since you are using SELECT(*), covering indexes are unlikely to help .Edit
Explanation – Selectivity and density are explained in detail here. e.g. iff your queries against
TimeOperationreturn only a small number of rows (rule of thumb is < 5%, but this isn’t always), will the index be used, i.e. your query is selective enough for SQL to choose the index onTimeOperation.The basic starting point would be:
And the basic indexes will be
Clustering Consideration / Option
If most of your records are inserted in ‘serial’ ascending TimeOperation order, i.e. intId and TimeOperation will both increase in tandem, then I would leave the clustering on intID (the default) (i.e. table DDL is
PRIMARY KEY CLUSTERED (IntId), which is the default anyway).However, if there is NO correlation between
IntIdandTimeOperation, and IF most of your queries are of the formSELECT * FROM [MyTable] WHERE TimeOperation between xx and yythenCREATE CLUSTERED INDEX CL_MyTable ON MyTable(TimeOperation)(and changing PK toPRIMARY KEY NONCLUSTERED (IntId)) should improve this query (Rationale: since contiguous times are kept together, fewer pages need to be read, and the bookmark lookup will be avoided). Even better, if values ofTimeOperationare guaranteed to be unique, thenCREATE UNIQUE CLUSTERED INDEX CL_MyTable ON MyTable(TimeOperation)will improve density as it will avoid the uniqueifier.Note – for the rest of this answer, I’m assuming that your
IntIdandTimeOperationsARE strongly correlated and hence the clustering is byIntId.Covering Indexes
As others have mentioned, your use of
SELECT (*)is bad practice and inter alia means covering indexes won’t be of any use (the exception beingCOUNT(*)).If your queries weren’t SELECT(*), but instead e.g.
Then altering your index on
TimeOperationto includefield1OR adding both to the index (with the most common filter first, or the most selective first if both filters are always present)
Either will avoid the rid / key lookup. The second (,) option will address your query where BOTH TimeOperation and Field1 are filtered in a WHERE or HAVING clause.
Re : What’s the difference between index on (TimeOperation, Field1) and separate indexes?
e.g.
will not be useful for the query
The index will only be useful for the queries which have TimeOperation
OR
Hope this helps?