Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 315809
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T08:19:32+00:00 2026-05-12T08:19:32+00:00

I have a table that will have 500,000+ records. Each record has a LineNumber

  • 0

I have a table that will have 500,000+ records.
Each record has a LineNumber field which is not unique and not part of the primary key.
Each record has a CreatedOn field.

I need to update all 500,000+ records to identify repeat records.

A repeat records is defined by a record that has the same LineNumber within the last seven days of its CreatedOn field.

alt text

In the diagram above row 4 is a repeat because it occurred only five days since row 1.
Row 6 is not a repeat even though it occurs only four days since row 4, but row 4 itself is already a repeat, so Row 6 can only be compared to Row 1 which is nine days prior to Row 6, therefore Row 6 is not a repeat.

I do not know how to update the IsRepeat field with stepping through each record one-by-one via a cursor or something.

I do not believe cursors is the way to go, but I’m stuck with any other possible solution.

I have considered maybe Common Table Expressions may be of help but I have no experience with them and have no idea where to start.

Basically this same process needs to be done on the table every day as the table is truncated and re-populated every single day. Once the table is re-populated, I have to go through and re-mark each record if it is a repeat or not.

Some assistance would be most appreciated.

UPDATE

Here is a script to create a table and insert test data

USE [Test]
GO

/****** Object:  Table [dbo].[Job]    Script Date: 08/18/2009 07:55:25 ******/
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Job]') AND type in (N'U'))
DROP TABLE [dbo].[Job]
GO

USE [Test]
GO

/****** Object:  Table [dbo].[Job]    Script Date: 08/18/2009 07:55:25 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Job]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Job](
    [JobID] [int] IDENTITY(1,1) NOT NULL,
    [LineNumber] [nvarchar](20) NULL,
    [IsRepeat] [bit] NULL,
    [CreatedOn] [smalldatetime] NOT NULL,
 CONSTRAINT [PK_Job] PRIMARY KEY CLUSTERED 
(
    [JobID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO


SET NOCOUNT ON

INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-01 07:52:08')
INSERT INTO dbo.Job VALUES ('1019',NULL,'2009-07-01 08:30:01')
INSERT INTO dbo.Job VALUES ('1028',NULL,'2009-07-01 09:30:35')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-01 10:51:10')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-02 09:22:30')
INSERT INTO dbo.Job VALUES ('1027',NULL,'2009-07-02 10:27:28')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-02 11:15:33')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-02 13:01:13')
INSERT INTO dbo.Job VALUES ('1014',NULL,'2009-07-03 12:05:56')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-03 13:57:34')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-03 15:38:54')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-04 16:32:20')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-05 13:46:46')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-05 15:08:35')
INSERT INTO dbo.Job VALUES ('1000',NULL,'2009-07-05 15:19:50')
INSERT INTO dbo.Job VALUES ('1011',NULL,'2009-07-05 16:37:19')
INSERT INTO dbo.Job VALUES ('1019',NULL,'2009-07-05 17:14:09')
INSERT INTO dbo.Job VALUES ('1009',NULL,'2009-07-05 20:55:08')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-06 08:29:29')
INSERT INTO dbo.Job VALUES ('1002',NULL,'2009-07-07 11:22:38')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-07 12:25:23')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-08 09:32:07')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-08 09:46:33')
INSERT INTO dbo.Job VALUES ('1016',NULL,'2009-07-08 10:09:08')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-09 10:45:04')
INSERT INTO dbo.Job VALUES ('1027',NULL,'2009-07-09 11:31:23')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-09 13:10:06')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-09 15:04:06')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-09 17:32:16')
INSERT INTO dbo.Job VALUES ('1012',NULL,'2009-07-09 19:51:28')
INSERT INTO dbo.Job VALUES ('1000',NULL,'2009-07-10 15:09:42')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-10 16:15:31')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-10 21:55:43')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-11 08:49:03')
INSERT INTO dbo.Job VALUES ('1022',NULL,'2009-07-11 16:47:21')
INSERT INTO dbo.Job VALUES ('1026',NULL,'2009-07-11 18:23:16')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-11 19:49:31')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-12 11:57:26')
INSERT INTO dbo.Job VALUES ('1003',NULL,'2009-07-13 08:32:20')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-13 09:31:32')
INSERT INTO dbo.Job VALUES ('1021',NULL,'2009-07-14 09:52:54')
INSERT INTO dbo.Job VALUES ('1021',NULL,'2009-07-14 11:22:31')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-14 11:54:14')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-14 15:17:08')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-15 13:27:08')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-15 14:10:56')
INSERT INTO dbo.Job VALUES ('1011',NULL,'2009-07-15 15:20:50')
INSERT INTO dbo.Job VALUES ('1028',NULL,'2009-07-15 15:39:18')
INSERT INTO dbo.Job VALUES ('1012',NULL,'2009-07-15 16:06:17')
INSERT INTO dbo.Job VALUES ('1017',NULL,'2009-07-16 11:52:08')

SET NOCOUNT OFF
GO
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T08:19:33+00:00Added an answer on May 12, 2026 at 8:19 am

    Ignores LineNumber is null. How should IsRepeat be handled in that case?

    It works for test data. Whether it will be efficient enough for production volumes?

    In the case of duplicate (LineNumber, CreatedOn) on pairs, arbitrarily choose one. (The one with minimum JobId)

    Basic idea:

    1. Get all JobId pairs that
      are at least seven days apart, by
      line number.
    2. Count the number of
      rows that are more than seven days
      from the left side, upto and
      including the right side. (CNT)
    3. Then we know if JobId x is not a repeat, the next not a repeat is the pair with X on
      the left side, and CNT = 1
    4. Use recursive CTE to start with the first row for each LineNumber
    5. Recursive element uses the pair with counts to get the next row.
    6. Finally update, setting all IsRepeat to 0 for non-repeats and 1 for everything else.

    ; with AllPairsByLineNumberAtLeast7DaysApart (LineNumber
                , LeftJobId
                , RightJobId
                , BeginCreatedOn
                , EndCreatedOn) as
            (select l.LineNumber
                , l.JobId
                , r.JobId
                , dateadd(day, 7, l.CreatedOn)
                , r.CreatedOn
            from Job l
            inner join Job r
                on l.LineNumber = r.LineNumber
                and dateadd(day, 7, l.CreatedOn) < r.CreatedOn
                and l.JobId <> r.JobId)
        -- Count the number of rows within from BeginCreatedOn 
        -- up to and including EndCreatedOn
        -- In the case of CreatedOn = EndCreatedOn, 
        -- include only jobId <= jobid, to handle ties in CreatedOn        
        , AllPairsCount(LineNumber, LeftJobId, RightJobId, Cnt) as
            (select ap.LineNumber, ap.LeftJobId, ap.RightJobId, count(*)
            from AllPairsByLineNumberAtLeast7DaysApart ap
            inner join Job j
                on j.LineNumber = ap.LineNumber
                and ap.BeginCreatedOn <= j.createdOn
                and (j.CreatedOn < ap.EndCreatedOn
                    or (j.CreatedOn = ap.EndCreatedOn 
                        and j.JobId <= ap.RightJobId))
             group by ap.LineNumber, ap.LeftJobId, ap.RightJobId)
        , Step1 (LineNumber, JobId, CreatedOn, RN) as
            (select LineNumber, JobId, CreatedOn
                , row_number() over 
                    (partition by LineNumber order by CreatedOn, JobId)
            from Job)
        , Results (JobId, LineNumber, CreatedOn) as    
            -- Start with the first rows.
            (select JobId, LineNumber, CreatedOn
            from Step1
            where RN = 1
            and LineNumber is not null
            -- get the next row
            union all
            select j.JobId, j.LineNumber, j.CreatedOn
            from Results r
            inner join AllPairsCount apc on apc.LeftJobId = r.JobId
            inner join Job j
                on j.JobId = apc.RightJobId
                and apc.CNT = 1)
        update j
        set IsRepeat = case when R.JobId is not null then 0 else 1 end
        from Job j
        left outer join Results r
            on j.JobId = R.JobId
        where j.LineNumber is not null
    

    EDIT:

    After I turned off the computer last night I realized I had made things more complicated than they needed to be. A more straightforward (and on the test data, slightly more effecient) query:

    Basic Idea:

    1. Generated PotentialStep (FromJobId, ToJobId) These are the pairs where if FromJobId
      is not a repeat, than ToJobId is also not a repeat. (First row by LineNumber more
      than seven days from FromJobId)
    2. Use a recursive CTE to start from the first JobId for each LineNumber and then step,
      using PontentialSteps, to each Non Repeating JobId

    ; with PotentialSteps (FromJobId, ToJobId) as
        (select FromJobId, ToJobId
        from (select f.JobId as FromJobId
                , t.JobId as ToJobId
                , row_number() over
                     (partition by f.LineNumber order by t.CreatedOn, t.JobId) as RN
            from Job f
            inner join Job t
                on f.LineNumber = t.LineNumber
                and dateadd(day, 7, f.CreatedOn) < t.CreatedOn) t
            where RN = 1)
    , NonRepeats (JobId) as
        (select JobId
        from (select JobId
                , row_number() over
                    (partition by LineNumber order by CreatedOn, JobId) as RN
            from Job) Start
        where RN = 1
        union all
        select J.JobId
        from NonRepeats NR
        inner join PotentialSteps PS
            on NR.JobId = PS.FromJobId
        inner join Job J
            on PS.ToJobId = J.JobId)
    update J
    set IsRepeat = case when NR.JobId is not null then 0 else 1 end
    from Job J
    left outer join NonRepeats NR
    on J.JobId = NR.JobId
    where J.LineNumber is not null
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table that stores all the volunteers, and each volunteer will be
I have a table that records a sequence of actions with a field that
I have a MySQL table that will only have one row. What should my
Ok I'm at my work this friday setting up a table that will have
A table that extends onto multiple printed pages will have its thead and tfoot
I have a table that has redundant data and I'm trying to identify all
I'm working at a complex script which could be processing upto 500,000 records. Here's
I'm designing a table that will be used to store information on which customers
I have a table that got into the db_owner schema, and I need it
I have a table that contains tasks and I want to give these an

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.