I’m (experimentally) doing a project where I have to merge data from several data

Question

0

Asked: June 16, 20262026-06-16T14:26:18+00:00 2026-06-16T14:26:18+00:00

I’m (experimentally) doing a project where I have to merge data from several data

0

I’m (experimentally) doing a project where I have to merge data from several data sets into a single SQL Server 2012 database. Some data is duplicated in these sets, and I’m working on a way to detect and remove duplicates. My current test is doing a hash of the data items and checking for duplicate hashes. This seems to work really well so far (if there are are hash collisions, it isn’t the end of the world).

I’m storing this hash in the database as a ‘binary(32)’ and whenever I need to insert a new row (I’m actually using a MERGE), I look for the hash value and only insert if it isn’t found. I have an index on the hash column to aid this search.

The problem I’m having is that the index is always extremely fragmented, and I’m sure this must be slowing things down unnecessarily. I assume this is due to the near-randomness of the binary data.

Are there are any index options I could be using to limit this fragmentation? At the moment I’m just using the defaults. Any clues would be appreciated.

Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T14:26:20+00:00

Editorial Team

2026-06-16T14:26:20+00:00Added an answer on June 16, 2026 at 2:26 pm

No answers unfortunately, but I did find that rebuilding the index periodically during the insertion phase helped, but obviously came with additional overhead. It wasn’t particularly worth it. I suspect experimenting with the fill factor may help also, but haven’t had time to investigate this fully.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m (experimentally) doing a project where I have to merge data from several data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply