I’m in the process of learning Databases and SQL. From what I have read adding an index to a table can increase performance to from around (log(n)) to even constant time.
Considering the increased space usage, at what point does it make sense to add an index to a table?
For example if i was using an employees table at how many records would the table have to have before you would add an index?
In this specific case would a clustered index make sense?
Here are two examples that might help you think about this. These are not to be relied on as technically accurate (e.g. because of the effect of consecutive reads on the disk being more efficient than random seeks) but they are an illustration.
The first example is to think of a small table that is a couple of blocks in size. To find a particular row in the table, the database would read those two blocks and get the data you require.
If there were an index on that table, the index is likely to be smaller than the table. Maybe one block in size. If the optimiser chose to use this index then the database would read the one block index and then read the one block of the table containing the row you require.
As mentioned above this is only an example and is meant to model reality rather than be accurate. In reality, Oracle will often do a full table scan of a table with an index even if the index would return as little as 5% of the rows (or is it less now with 11G?).
The second example involves making data modifications on the table. Whenever a change to a row in a table is made (
INSERT,UPDATE,DELETE,MERGE), every index on the table will need to be updated.So, indexes may make queries faster and updates slower. And indexes take up space. That’s the price you pay.
You ask “at how many records would the table have to have before you would add an index”? I think you are looking at it the wrong way because that shouldn’t be for you to worry about. Add the index when the table has zero rows. The optimiser will work out the right thing to do. If it is quicker to use the index, it will use it. If it is quicker to avoid the index and do a full scan of the table then it will do that.
I would generally index columns that are used for the primary key and foreign keys plus any columns that are used frequently for access.
In general, I wouldn’t worry too much about the space used by an index unless the tables are very large (in which case it might be worth looking at bitmap indexes). It is a trade off of space vs time but an index is going to be smaller than the table that is being indexed.
Another option if you are worried about space it to compress the index. This shouldn’t have much impact on the performance but will take less space. Note that this is different to table compression.
This is a long way of giving the Tom Kyte answer of “It depends”. The best thing you can probably do is benchmark your particular problem and go from there. You seem to be trying to do premature optimisation which is never a good thing.