I’m currently confronted with a strange behaviour in my database when I’m querying a

Question

0

Asked: June 15, 20262026-06-15T23:17:26+00:00 2026-06-15T23:17:26+00:00

I’m currently confronted with a strange behaviour in my database when I’m querying a

0

I’m currently confronted with a strange behaviour in my database when I’m querying a minimum ID for a specific date in a table contains about a hundred million rows. The query is quite simple :

SELECT MIN(Id) FROM Connection WITH(NOLOCK) WHERE DateConnection = '2012-06-26'

This query nevers end, at least I let it run for hours. The DateConnection column is not an index neither included in one. So I would understand that this query can last quite a bit. But I tried the following query which runs in few seconds :

SELECT Id FROM Connection WITH(NOLOCK) WHERE DateConnection = '2012-06-26'

It returns 300k rows.

My table is defined as this :

CREATE TABLE [dbo].[Connection](  
    [Id] [bigint] IDENTITY(1,1) NOT NULL,  
    [DateConnection] [datetime] NOT NULL,  
    [TimeConnection] [time](7) NOT NULL,  
    [Hour]  AS (datepart(hour,[TimeConnection])) PERSISTED NOT NULL,  
    CONSTRAINT [PK_Connection] PRIMARY KEY CLUSTERED   
    (  
        [Hour] ASC,  
        [Id] ASC  
    )  
)

And it has the following index :

CREATE UNIQUE NONCLUSTERED INDEX [IX_Connection_Id] ON [dbo].[Connection]  
(  
    [Id] ASC  
)ON [PRIMARY]

One solutions I find using this strange behaviour is using the following code. But it seems to me quite a bit heavy for such a simple query.

create table #TempId
(
    [Id] bigint
)
go

insert into #TempId
select id from partitionned_connection with(nolock) where dateconnection = '2012-06-26'

declare @displayId bigint
select @displayId = min(Id) from #CoIdTest

print @displayId 
go

drop table #TempId
go

Has anybody been confronted to this behaviour and what is the cause of it ? Is the minimum aggregate scanning the entire table ? And if this is the case why the simple select does not ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T23:17:28+00:00

The root cause of the problem is the non-aligned nonclustered index, combined with the statistical limitation Martin Smith points out (see his answer to another question for details).

Your table is partitioned on [Hour] along these lines:

CREATE PARTITION FUNCTION PF (integer)
AS RANGE RIGHT
FOR VALUES (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23);

CREATE PARTITION SCHEME PS
AS PARTITION PF ALL TO ([PRIMARY]);

-- Partitioned
CREATE TABLE dbo.Connection
(
    Id              bigint IDENTITY(1,1) NOT NULL,
    DateConnection  datetime NOT NULL,
    TimeConnection  time(7) NOT NULL,
    [Hour]  AS (DATEPART(HOUR, TimeConnection)) PERSISTED NOT NULL,
    
    CONSTRAINT [PK_Connection]
    PRIMARY KEY CLUSTERED
    (  
        [Hour] ASC,  
        [Id] ASC  
    )
    ON PS ([Hour])
);

-- Not partitioned
CREATE UNIQUE NONCLUSTERED INDEX [IX_Connection_Id]
ON dbo.Connection
(  
    Id ASC
)ON [PRIMARY];

-- Pretend there are lots of rows
UPDATE STATISTICS dbo.Connection WITH ROWCOUNT = 200000000, PAGECOUNT = 4000000;

The query and execution plan are:

SELECT 
    MinID = MIN(c.Id)
FROM dbo.Connection AS c WITH (READUNCOMMITTED) 
WHERE
    c.DateConnection = '2012-06-26';

Selected plan

The optimizer takes advantage of the index (ordered on Id) to transform the MIN aggregate to a TOP (1) – since the minimum value will by definition be the first value encountered in the ordered stream. (If the nonclustered index were also partitioned, the optimizer would not choose this strategy since the required ordering would be lost).

The slight complication is that we also need to apply the predicate in the WHERE clause, which requires a lookup to the base table to fetch the DateConnection value. The statistical limitation Martin mentions explains why the optimizer estimates it will only need to check 119 rows from the ordered index before finding one with a DateConnection value that will match the WHERE clause. The hidden correlation between DateConnection and Id values means this estimate is a very long way off.

In case you are interested, the Compute Scalar calculates which partition to perform the Key Lookup into. For each row from the nonclustered index, it computes an expression like [PtnId1000] = Scalar Operator(RangePartitionNew([dbo].[Connection].[Hour] as [c].[Hour],(1),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23))), and this is used as the leading key of the lookup seek. There is prefetching (read-ahead) on the nested loops join, but this needs to be an ordered prefetch to preserve the sorting required by the TOP (1) optimization.

Solution

We can avoid the statistical limitation (without using query hints) by finding the minimum Id for each Hour value, and then taking the minimum of the per-hour minimums:

-- Global minimum
SELECT 
    MinID = MIN(PerHour.MinId)
FROM 
(
    -- Local minimums (for each distinct hour value)
    SELECT 
        MinID = MIN(c.Id)
    FROM dbo.Connection AS c WITH(READUNCOMMITTED) 
    WHERE
        c.DateConnection = '2012-06-26' 
    GROUP BY
        c.[Hour]
) AS PerHour;

The execution plan is:

Serial plan

If parallelism is enabled, you will see a plan more like the following, which uses parallel index scan and multi-threaded stream aggregates to produce the result even faster:

Parallel plan

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently confronted with a strange behaviour in my database when I’m querying a

Leave an answerCancel reply

1 Answer

Solution

Leave an answer
Cancel reply