I have a table that holds phone calls, with the following fields: ID STARTTIME

Question

0

Editorial Team

Asked: May 27, 20262026-05-27T22:40:47+00:00 2026-05-27T22:40:47+00:00

I have a table that holds phone calls, with the following fields: ID STARTTIME

0

I have a table that holds phone calls, with the following fields:

ID
STARTTIME
ENDTIME
STATUS
CALL_FROM
CALL_TO

There are 2,9 million records loaded into a local PostgreSQL database. I added indexes on ID (unique index), starttime and endtime.

Searching on stackoverflow, I found some useful SQL and modified it to what I think logically should work. The problem is that the query runs for many hours and never returns:

SELECT T1.sid, count(*) as CountSimultaneous
FROM calls_nov T1, calls_nov T2
WHERE
     T1.StartTime between T2.StartTime and T2.EndTime
     and T1.StartTime between '2011-11-02' and '2011-11-03'
GROUP BY
     T1.sid
ORDER BY CountSimultaneous DESC;

Can someone please either suggest a way to fix the query/index so that it actually works or suggest another way to calculate concurrent calls?

EDIT:

Explain plan:

Sort  (cost=11796758237.81..11796758679.47 rows=176663 width=35)
  Sort Key: (count(*))
  ->  GroupAggregate  (cost=0.00..11796738007.56 rows=176663 width=35)
        ->  Nested Loop  (cost=0.00..11511290152.45 rows=57089217697 width=35)

Table creation script:

CREATE TABLE calls_nov (
  sid varchar,
  starttime timestamp, 
  endtime timestamp, 
  call_to varchar, 
  call_from varchar, 
  status varchar);

Index creation:

CREATE UNIQUE INDEX sid_unique_index on calls_nov (sid);

CREATE INDEX starttime_index on calls_nov (starttime);

CREATE INDEX endtime_index on calls_nov (endtime);

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T22:40:47+00:00

1.) Your query did not catch all overlaps – this was fixed by the other answers, already.

2.) The data type of your columns starttime and endtime is timestamp. So your WHERE clause is slightly wrong, too:

BETWEEN '2011-11-02' AND '2011-11-03'

This would include ‘2011-11-03 00:00’. The upper border has to be excluded.

3.) Removed the mixed case syntax without double-quotes. Unquoted identifiers are cast to lower case automatically. To put it simple: Best don’t use mixed case identifiers at all in PostgreSQL.

4.) Transformed the query to use explicit JOIN which is always preferable. Actually, I made it a LEFT [OUTER] JOIN, because I want to count calls that overlap with no other calls, too.

5.) Simplified the syntax a bit to arrive at this base query:

SELECT t1.sid, count(*) AS ct
FROM   calls_nov t1
LEFT   JOIN calls_nov t2 ON t1.starttime <= t2.endtime
                        AND t1.endtime >= t2.starttime
WHERE  t1.starttime >= '2011-11-02 0:0'::timestamp
AND    t1.starttime <  '2011-11-03 0:0'::timestamp
GROUP  BY 1
ORDER  BY 2 DESC;

This query is extremely slow for a big table, because every row starting on ‘2011-11-02’ has to be compared to every row in the whole table, which leads to (almost) O(n²) cost.

Faster

We can drastically cut down the cost by pre-selecting possible candidates. Only select columns and rows you need. I do this with two CTE.

Select calls starting on the day in question. -> CTE x
Calculate the latest end of those calls. (subquery in CTE y)
Select only calls that overlap with the total range of CTE x. -> CTE y
The final query is much faster than querying the huge underlying table.

WITH x AS (
    SELECT sid, starttime, endtime
    FROM   calls_nov
    WHERE  starttime >= '2011-11-02 0:0'
    AND    starttime <  '2011-11-03 0:0'
    ), y AS (
    SELECT starttime, endtime
    FROM   calls_nov
    WHERE  endtime >= '2011-11-02 0:0'
    AND    starttime <= (SELECT max(endtime) As max_endtime FROM x)
    )
SELECT x.sid, count(*) AS count_overlaps
FROM   x
LEFT   JOIN y ON x.starttime <= y.endtime
             AND x.endtime >= y.starttime
GROUP  BY 1
ORDER  BY 2 DESC;

Faster yet

I have a real life table of 350.000 rows with overlapping start / end timestamps similar to yours. I used that for a quick benchmark. PostgreSQL 8.4, scarce resources because it is a test DB. Indexes on start and end. (Index on ID column is irrelevant here.) Tested with EXPLAIN ANALYZE, best of 5.

Total runtime: 476994.774 ms

CTE variant:
Total runtime: 4199.788 ms — that’s > factor 100.

After adding a multicolumn index of the form:

CREATE INDEX start_end_index on calls_nov (starttime, endtime);

Total runtime: 4159.367 ms

Ultimate Speed

If that is not enough, there is a way to speed it up yet another order of magnitude. Instead of the CTEs above, materialize the temp tables and – this is the crucial point – create an index on the second one. Could look like this:

Execute as one transaction:

CREATE TEMP TABLE x ON COMMIT DROP AS   
    SELECT sid, starttime, endtime
    FROM   calls_nov
    WHERE  starttime >= '2011-11-02 0:0'
    AND    starttime <  '2011-11-03 0:0';

CREATE TEMP TABLE y ON COMMIT DROP AS
    SELECT starttime, endtime
    FROM   calls_nov
    WHERE  endtime >= '2011-11-02 0:0'
    AND    starttime <= (SELECT max(endtime) FROM x);

CREATE INDEX y_idx ON y (starttime, endtime); -- this is where the magic happens

SELECT x.sid, count(*) AS ct
FROM   x
LEFT   JOIN y ON x.starttime <= y.endtime
             AND x.endtime >= y.starttime
GROUP  BY 1
ORDER  BY 2 DESC;

Read about temporary tables in the manual.

Ultimate solution

Create a plpgsql function that encapsulates the magic.
Diagnose the typical size of your temp tables. Create them standalone and measure:
```
  SELECT pg_size_pretty(pg_total_relation_size('tmp_tbl'));
```
If they are bigger than your setting for temp_buffers then temporarily set them high enough in your function to hold both your temporary tables in RAM. It is a major speedup if you don’t have to swap to disc. (Must be first use of temp tables in session to have effect.)

CREATE OR REPLACE FUNCTION f_call_overlaps(date)
  RETURNS TABLE (sid varchar, ct integer) AS
$BODY$
DECLARE
    _from timestamp := $1::timestamp;
    _to   timestamp := ($1 +1)::timestamp;
BEGIN

SET temp_buffers = 64MB'; -- example value; more RAM for temp tables;

CREATE TEMP TABLE x ON COMMIT DROP AS   
    SELECT c.sid, starttime, endtime  -- avoid naming conflict with OUT param
    FROM   calls_nov c
    WHERE  starttime >= _from
    AND    starttime <  _to;

CREATE TEMP TABLE y ON COMMIT DROP AS
    SELECT starttime, endtime
    FROM   calls_nov
    WHERE  endtime >= _from
    AND    starttime <= (SELECT max(endtime) FROM x);

CREATE INDEX y_idx ON y (starttime, endtime);

RETURN QUERY
SELECT x.sid, count(*)::int -- AS ct
FROM   x
LEFT   JOIN y ON x.starttime <= y.endtime AND x.endtime >= y.starttime
GROUP  BY 1
ORDER  BY 2 DESC;

END;
$BODY$   LANGUAGE plpgsql;

Call:

SELECT * FROM f_call_overlaps('2011-11-02') -- just name your date

Total runtime: 138.169 ms — that’s factor 3000

What else can you do to speed it up?

General performance optimization.

CLUSTER calls_nov USING starttime_index; -- this also vacuums the table fully

ANALYZE calls_nov;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a table that holds phone calls, with the following fields: ID STARTTIME

Leave an answerCancel reply

1 Answer

Faster

Faster yet

Ultimate Speed

Ultimate solution

What else can you do to speed it up?

Leave an answer
Cancel reply