Indexing is used to improve performance of sql query but I always found it little difficult to decide in which situation should I use index and in which not. I want to clarify some of my doubts regarding non-clustered index
-
What is Non-clustered index key. As book say each index row of non clustered index contains non clustered key value so is it mean it is the column in which we created non clustered index i.e. If created index on empname varchar(50) , so non clustered key will be
that empname . -
Why It is preferable to create index on column with small width. It is due to comparison with more width column takes more time for SQL server engine or is it due to it will increment hierarchy of intermediate nodes as page size is fixed so with more width column in a page or node less index row it will contain.
-
If a table contain multiple non clustered column so whether non clustered key will be combination of all this column or some unique id is generated internally by SQL with locator which will point to actual data row. If possible please clear it will some real time example and graphs.
-
Why It is said that column with non-repeatable value is good to create index as even if it contains repeated value it will definitely improve performance as once it reach to certain key value its locator will immediately found its actual row.
-
If column used in indexing is not unique how it find actual data row from table.
Please refer any book or tutorial which will be useful to clear my doubts.
First I think we need to cover what an actual index is. Usually in RDBMS indexes are implemented using a variant of B-tree’s (B+ variant is most common). To put it shortly – think a binary search tree optimized for being stored on a disk. The result of looking up a key in the B-tree is usually the primary key of the table. That means if a lookup in the index completes and we need more data than what is present in the index we can do a seek in the table using the primary key.
Please remember that when we think of performance for a RDBMS we usually measure this in disk accesses (I decide to ignore locking and other issues here) and not so much CPU time.
Having the index being non-clustered means that the actual way the data in the table is stored has no relation to the index key – whereas a clustered index specifies that the data in the table will be sorted (or clustered by) the index key – this is why there can only be one clustered index per table.
2) Back to our model of measuring performance – if the index key is has small width (fits into a low amount of bytes) it means that per block of disk data we retrieve we can fit more keys – and as such perform lookups in the B-tree much faster if you measure disk I/O.
3) I tried explaining this further up – unfortunately I don’t really have any graphs or drawings to indicate this – hopefully someone else can come along and share these.
4) If you’re running a query like so:
On a table with an index defined like so:
If sometable has alot of rows where akey is equal to ‘some value’ this means alot of lookups in both the index but also in the actual table to retrieve the values of something and something_else. Whereas if there’s a good chance that this filtering returns few rows then it also means less disk accesses.
5) See earlier explanation
Hope this helps 🙂