I have a table that has a large string key (varchar(1024)) that I was thinking to be indexed over on SQL server (I want to be able to search over it quickly but also inserts are important). In sql 2008 I don’t get a warning for this, but under sql server 2005 it tells me that it exceeds 900 bytes and that inserts/updates with the column over this size will be dropped (or something in that area)
What are my alternatives if I would want to index on this large column ? I don’t know if it would worth it if I could anyway.
An index with all the keys near 900 bytes would be very large and very deep (very few keys per page result in very tall B-Trees).
It depends on how you plan to query the values. An index is useful in several cases:
WHERE column='ABC'or a join conditionON a.column = B.someothercolumn.WHERE column BETWEEN 'ABC' AND 'DEF'there are other less obvious examples, like a partial match:WHERE column LIKE 'ABC%'.ORDER BY columnrequirement to avoid a stop-and-go sort, and also can help certain hidden sort requirement, like aROW_NUMBER() OVER (ORDER BY column).So, why do you need the index for? What kind of queries would use it?
For range scans and for ordering requirements there is no other solution but to have the index, and you will have to weigh the cost of the index vs. the benefits.
For probes you can, potentially, use hash to avoid indexing a very large column. Create a persisted computed column as
column_checksum = CHECKSUM(column)and then index on that column. Queries have to be rewritten to useWHERE column_checksum = CHECKSUM('ABC') AND column='ABC'. Careful consideration would have to be given to weighing the advantage of a narrow index (32 bit checksum) vs. the disadvantages of collision double-check and lack of range scan and order capabilities.after the comment
I once had a similar problem and I used a hash column. The value was too large to index (>1K) and I also needed to convert the value into an ID to store (basically, a dictionary). Something along the lines:
In this case the dictionary table is organized as a clustered index on the
values_hashcolumn which groups all the colliding hash values together. Theidcolumn is added to make the clustered index unique, avoiding the need for a hidden uniqueifier column. This structure makes the lookup for@valueas efficient as possible, w/o a hugely inefficient index onvalueand bypassing the 900 character limitation. The primary key onidis non-clustered which means that looking up thevaluefrom andidincurs the overhead of one extra probe in the clustered index.Not sure if this answers your problem, you obviously know more about your actual scenarios than I do. Also, the code does not handle error conditions and can actually insert duplicate @value entries, which may or may not be correct.