I am working on a document management system. Some documents were imported from another

Question

0

Asked: May 21, 20262026-05-21T09:02:38+00:00 2026-05-21T09:02:38+00:00

I am working on a document management system. Some documents were imported from another

0

I am working on a document management system. Some documents were imported from another system. Due to an error, some of them were imported twice. I need to delete the duplicates. I have the document id from the previous system, but can’t just delete by that as some documents are associated with multiple accounts and are supposed to be in there twice, so I have to check against that as well. The associated values are in different tables. I have created the following script to come up with the doc id’s to delete but it is incredibly slow (it has been running for four days on a table with less than 2 million records).

declare @docidtodelete int
declare @docid int
declare @sourcedocid varchar(12)
declare @taxid decimal(9,0)
declare @account bigint


select @docid = MIN(d.docid) from DOCS d
inner join CONTENTS c on d.DOCID = c.DOCID and c.FOLID=1
while @docid is not null
      begin
            --get the source document id for this document
            select @sourcedocid = val from VTAB0031 where IDXID=31 and DOCID=@docid

            -- see if there is another document with the same source document id
            select @docidtodelete = isnull(MAX(v.docid),0) from VTAB0031 v
            inner join CONTENTS c on v.DOCID = c.DOCID and c.FOLID=1
            where IDXID=31 and VAL = @sourcedocid

            if @docid<@docidtodelete -- we have a possible duplicate so lets check and see if it matches on account
                  begin
                        select @account = val from VTAB0002 where IDXID=2 and DOCID=@docid
                        select @docidtodelete = isnull(max(v.docid),0) from VTAB0002 v
                              where IDXID=2 and VAL = @account and v.DOCID=@docidtodelete
                        if @docid<@docidtodelete -- we still have a possible duplicate so lets check and see if it matches on taxid
                        begin
                              select @taxid = val from VTAB0006 where IDXID=6 and DOCID=@docid
                              select @docidtodelete = isnull(max(v.docid),0) from VTAB0006 v
                                    where IDXID=6 and VAL = @taxid and v.DOCID = @docidtodelete
                              if @docid<@docidtodelete -- we still have a match so delete
                                                                begin
                                    insert into deletedDuplicates values(@docidtodelete ,@docid)
                                                                end
                        end
                  end
            select @docid = MIN(d.docid) from DOCS d
                  inner join CONTENTS c on d.DOCID = c.DOCID and c.FOLID=1
                  where d.DOCID > @docid
      end

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T09:02:39+00:00

It’s always better to use set operations rather than procedural operations when working with an RDBMS.

Try this instead:

select  DocIdToDelete,
        DocIdToKeep
into    deletedDuplicates
from
(
    select  max(DocId) as DocIdToDelete,
            min(DocId) as DocIdToKeep,
            SourceDocId,
            Account,
            TaxId,
            Count(*) as NumberMatches
    from
    (
        select  d.docid as DocId,
                s.val as SourceDocId,
                a.val as Account,
                t.val as TaxId
        from    DOCS d
                inner join CONTENTS c on c.DOCID = d.DOCID
                inner join VTAB0031 s on s.DOCID = d.DOCID
                inner join VTAB0002 a on a.DOCID = d.DOCID
                inner join VTAB0006 t on t.DOCID = d.DOCID
        where   c.FOLID = 1
                and s.IDXID = 31
                and a.IDXID = 2
                and t.IDXID = 6
    ) Summary
    group by    SourceDocId,
                Account,
                TaxId
    having  NumberMatches > 1
) Duplicates

UPDATE

I made a new query that should get all duplicate records. And this should run more effeciently as well, using the indexes.

create table UniqueDocuments
(
    DocId int not null,
    SourceDocId varchar(12) not null,
    Account bigint not null,
    TaxId decimal(9,0) not null
    primary key clustered (SourceDocId, Account, TaxId)
)
go

insert into UniqueDocuments (DocId, SourceDocId, Account, TaxId)
select  min(d.docid) as DocId,
        s.val as SourceDocId,
        a.val as Account,
        t.val as TaxId
from    DOCS d
        inner join CONTENTS c on c.DOCID = d.DOCID
        inner join VTAB0031 s on s.DOCID = d.DOCID
        inner join VTAB0002 a on a.DOCID = d.DOCID
        inner join VTAB0006 t on t.DOCID = d.DOCID
where   c.FOLID = 1
        and s.IDXID = 31
        and a.IDXID = 2
        and t.IDXID = 6
group by s.val,
        a.val,
        t.val

insert into DeletedDocuments (DocIdToDelete, DocIdToKeep)
select  d.DocId as DocIdToDelete,
        ud.DocId as DocIdToKeep
from    DOCS d
        inner join CONTENTS c on c.DOCID = d.DOCID
        inner join VTAB0031 s on s.DOCID = d.DOCID
        inner join VTAB0002 a on a.DOCID = d.DOCID
        inner join VTAB0006 t on t.DOCID = d.DOCID
        inner join UniqueDocuments ud on ud.SourceDocId = s.val
                                         and ud.Account = a.val
                                         and ud.TaxId = t.val
where   c.FOLID = 1
        and s.IDXID = 31
        and a.IDXID = 2
        and t.IDXID = 6
        and d.DocId <> ud.DocId

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working on a document management system. Some documents were imported from another

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply