Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3345030
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T01:08:18+00:00 2026-05-18T01:08:18+00:00

I have two processes that work with data in the same table. One process

  • 0

I have two processes that work with data in the same table.

One process inserts daily, one by one (pure ADO.NET), about 20000 records in the target table.

The second process calls ( periodically, every 15 minutes ) a stored procedure that

  1. Detects the duplicates in those 20000 records by looking at all the records 7 days back and marks them as such.
  2. Marks all records that are not duplicates with a ‘ToBeCopied’ flag.
  3. Select a number of columns from the records marked as ‘ToBeCopied’ and returns the set.

Sometimes these two processes overlap ( due to delays in data processing ) and I am suspecting that if the first process inserts new records when second process is somewhere between 1 and 2 then records will be marked ‘ToBeCopied’ without having gone through the duplicate sifting.

This means that now the store procedure is returning some duplicates.

This is my theory but In practice I have not been able to replicate it…

I am using LINQ to SQL to insert duplicates (40-50 or so a second) and while this is running I am manually calling the stored procedure and store its results.

It appears that when the stored procedure is running the inserting pauses … such that at the end no duplicates have made it to the final result set.

I am wondering if LINQ to SQL or SQL Server has a default mechanism that prevents concurrency and is pausing the inserting while the selecting or updating takes place.

What do you think?

EDIT 1:

The ‘duplicates’ are not identical rows. They are ‘equivalent’ given the business/logical entities these records represent. Each row has a unique primary key.

P.S. Selecting the result set takes place with NOLOCK. Trying to reproduce on SQL Server 2008. Problem is alleged to occur on SQL Server 2005.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T01:08:18+00:00Added an answer on May 18, 2026 at 1:08 am

    What do I think?

    • Why do you have duplicates in the database? Data purity begins in the client at the app drawing board, which should have a data model that simply does not allow for duplicates.
    • Why do you have duplicates in the database? Check constraints should prevent this from happening if the client app misbehaves
    • If you have duplicates, the reader must be prepared to handle them.
    • You cannot detect duplicates in two stages (look then mark), it has to be one single atomic mark. In fact, you cannot do almost anything in a database in two stages ‘look and mark’. All ‘look for record then mark the records found’ processes fail under concurrency.
    • NOLOCK will give you inconsistent reads. Records will be missing or read twice. Use SNAPSHOT isolation.
    • Linq-To-SQL has no pixie dust to replace bad design.

    Update

    Consider this for instance:

    A staging table with a structure like:

     CREATE TABLE T1 (
      id INT NOT NULL IDENTITY(1,1) PRIMARY KEY, 
      date DATETIME NOT NULL DEFAULT GETDATE(), 
      data1 INT NULL, 
      data2 INT NULL, 
      data3 INT NULL);
    

    Process A is doing inserts at leisure into this table. It doe snot do any validation, it just dumps raw records in:

    INSERT INTO T1 (data1, data2, data3) VALUES (1,2,3);
    INSERT INTO T1 (data1, data2, data3) VALUES (2,1,4);
    INSERT INTO T1 (data1, data2, data3) VALUES (2,2,3);
    ...
    INSERT INTO T1 (data1, data2, data3) VALUES (1,2,3);
    INSERT INTO T1 (data1, data2, data3) VALUES (2,2,3);
    ...
    INSERT INTO T1 (data1, data2, data3) VALUES (2,1,4);
    ...
    

    Process B is tasked with extracting this staging table and moving cleaned up data into a table T2. It has to remove duplicates that, by business rules, mean records with same values in data1, data2 and data3. Within a set of duplicates, only the first record by date should be kept:

    set transaction isolation snapshot;
    declare @maxid int;
    
    begin transaction
    -- Snap the current max (ID) 
    --
    select @maxid = MAX(id) from T1;
    
    -- Extract the cleaned rows into T2 using ROW_NUMBER() to
    -- filter out duplicates
    --
    with cte as (
     SELECT date, data1, data2, datta3,
       ROW_NUMBER() OVER 
          (PARTITION BY data1, data2, data3 ORDER BY date) as rn
     FROM T1
     WHERE id <= @maxid)
    MERGE INTO T2 
    USING (
      SELECT date, data1, data2, data3
      FROM cte
      WHERE rn = 1
    ) s ON s.data1 = T2.data1
      AND s.data2 = T2.data2
      AND s.data3 = T2.data3
    WHEN NOT MATCHED BY TARGET
      THEN INSERT (date, data1, data2, data3)
      VALUES (s.date, s.data1, s.data2, s.data3);
    
    -- Delete the processed row up to @maxid
    --
    DELETE FROM T1 
      WHERE id <= @maxid;
    COMMIT;
    

    Assuming Process A only inserts, this procedure would safely process the staging table and extract the cleaned duplicates. Of course, this is just a skeleton, a true ETL process would have error handling via BEGIN TRY/BEGIN CATCH and transaction log size control via batching.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.