Summary: I have a table populated via the following:
insert into the_table (...) select ... from some_other_table
Running the above query with no primary key on the_table is ~15x faster than running it with a primary key, and I don’t understand why.
The details: I think this is best explained through code examples.
I have a table:
create table the_table (
a int not null,
b smallint not null,
c tinyint not null
);
If I add a primary key, this insert query is terribly slow:
alter table the_table
add constraint PK_the_table primary key(a, b);
-- Inserting ~880,000 rows
insert into the_table (a,b,c)
select a,b,c from some_view;
Without the primary key, the same insert query is about 15x faster. However, after populating the_table without a primary key, I can add the primary key constraint and that only takes a few seconds. This one really makes no sense to me.
More info:
- The estimated execution plan shows 0% total query time spent on the clustered index insert
- SQL Server 2008 R2 Developer edition, 10.50.1600
Any ideas?
This is a good question, but a pretty crappy question too. Before you ask why an index slows down inserts, do you know what an index is?
If not, I suggest you read up on it. A clustered index is a B-tree, (Balanced tree), so every insert has to …. wait for it…. balance the tree. Hence clustered inserts are slower than inserting on heaps. If you don’t know what a heap is, then I suggest stop using SQL Server until you understand basics. Else you are attempting to use a product of which you have no idea what you are doing, and basically driving a truck on the highway, blindfolded, thinking you are riding a bike. Unexpected results…
So when you create a clustered Index after a table is populated, your ‘heap’ has some statistics to use, and SQL can basically optimise a few things. This process is much more complicated than this, but in some cases you will find that creating a clustered index after the fact could be a lot slower than simply to insert to it. This has all to do with key types, number of columns, types of columns etc. This is unfortunately not a topic that is fit for an answer, this is more a whole course and few books by itself. Looking at your table above, it is a VERY simple table with ~7byte rows. In this instance a create-index after the insert will be faster, but chuck in a few varchar(250)’s etc, and the ballgame changes.
If you didn’t know, a clustered index, (if your table has one), IS your table.
Hope this helps.