Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8935635
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T10:02:07+00:00 2026-06-15T10:02:07+00:00

I have 2 tables in PostgreSQL 9.1 – flight_2012_09_12 containing approx 500,000 rows and

  • 0

I have 2 tables in PostgreSQL 9.1 – flight_2012_09_12 containing approx 500,000 rows and position_2012_09_12 containing about 5.5 million rows. I’m running a simple join query and it’s taking a long time to complete and despite the fact the tables aren’t small I’m convinced there are some major gains to be made in the execution.

The query is:

SELECT f.departure, f.arrival, 
       p.callsign, p.flightkey, p.time, p.lat, p.lon, p.altitude_ft, p.speed 
FROM position_2012_09_12 AS p 
JOIN flight_2012_09_12 AS f 
     ON p.flightkey = f.flightkey 
WHERE p.lon < 0 
      AND p.time BETWEEN '2012-9-12 0:0:0' AND '2012-9-12 23:0:0'

The output of explain analyze is:

Hash Join  (cost=239891.03..470396.82 rows=4790498 width=51) (actual time=29203.830..45777.193 rows=4403717 loops=1)
Hash Cond: (f.flightkey = p.flightkey)
->  Seq Scan on flight_2012_09_12 f  (cost=0.00..1934.31 rows=70631 width=12) (actual time=0.014..220.494 rows=70631 loops=1)
->  Hash  (cost=158415.97..158415.97 rows=3916885 width=43) (actual time=29201.012..29201.012 rows=3950815 loops=1)
     Buckets: 2048  Batches: 512 (originally 256)  Memory Usage: 1025kB
     ->  Seq Scan on position_2012_09_12 p  (cost=0.00..158415.97 rows=3916885 width=43) (actual time=0.006..14630.058 rows=3950815 loops=1)
           Filter: ((lon < 0::double precision) AND ("time" >= '2012-09-12 00:00:00'::timestamp without time zone) AND ("time" <= '2012-09-12 23:00:00'::timestamp without time zone))
Total runtime: 58522.767 ms

I think the problem lies with the sequential scan on the position table but I can’t figure out why it’s there. The table structures with indexes are below:

               Table "public.flight_2012_09_12"
   Column       |            Type             | Modifiers 
--------------------+-----------------------------+-----------
callsign           | character varying(8)        | 
flightkey          | integer                     | 
source             | character varying(16)       | 
departure          | character varying(4)        | 
arrival            | character varying(4)        | 
original_etd       | timestamp without time zone | 
original_eta       | timestamp without time zone | 
enroute            | boolean                     | 
etd                | timestamp without time zone | 
eta                | timestamp without time zone | 
equipment          | character varying(6)        | 
diverted           | timestamp without time zone | 
time               | timestamp without time zone | 
lat                | double precision            | 
lon                | double precision            | 
altitude           | character varying(7)        | 
altitude_ft        | integer                     | 
speed              | character varying(4)        | 
asdi_acid          | character varying(4)        | 
enroute_eta        | timestamp without time zone | 
enroute_eta_source | character varying(1)        | 
Indexes:
"flight_2012_09_12_flightkey_idx" btree (flightkey)
"idx_2012_09_12_altitude_ft" btree (altitude_ft)
"idx_2012_09_12_arrival" btree (arrival)
"idx_2012_09_12_callsign" btree (callsign)
"idx_2012_09_12_departure" btree (departure)
"idx_2012_09_12_diverted" btree (diverted)
"idx_2012_09_12_enroute_eta" btree (enroute_eta)
"idx_2012_09_12_equipment" btree (equipment)
"idx_2012_09_12_etd" btree (etd)
"idx_2012_09_12_lat" btree (lat)
"idx_2012_09_12_lon" btree (lon)
"idx_2012_09_12_original_eta" btree (original_eta)
"idx_2012_09_12_original_etd" btree (original_etd)
"idx_2012_09_12_speed" btree (speed)
"idx_2012_09_12_time" btree ("time")

          Table "public.position_2012_09_12"
Column    |            Type             | Modifiers 
-------------+-----------------------------+-----------
 callsign    | character varying(8)        | 
 flightkey   | integer                     | 
 time        | timestamp without time zone | 
 lat         | double precision            | 
 lon         | double precision            | 
 altitude    | character varying(7)        | 
 altitude_ft | integer                     | 
 course      | integer                     | 
 speed       | character varying(4)        | 
 trackerkey  | integer                     | 
 the_geom    | geometry                    | 
Indexes:
"index_2012_09_12_altitude_ft" btree (altitude_ft)
"index_2012_09_12_callsign" btree (callsign)
"index_2012_09_12_course" btree (course)
"index_2012_09_12_flightkey" btree (flightkey)
"index_2012_09_12_speed" btree (speed)
"index_2012_09_12_time" btree ("time")
"position_2012_09_12_flightkey_idx" btree (flightkey)
"test_index" btree (lon)
"test_index_lat" btree (lat)

I can’t think of any other way to rewrite the query and so I’m stumped at this point. If the current setup is as good as it gets so be it but it seems to me that it should be much faster than it currently is. Any help would be much appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T10:02:08+00:00Added an answer on June 15, 2026 at 10:02 am

    The reason you are getting a sequential scan is that Postgres believes that it will read less disk pages that way than using indexes. It is probably right. Consider, if you use a non-covering index, you need to read all the matching index pages. it essentially outputs a list of row identifiers. The DB engine then needs to read each of the matching data pages.

    Your position table uses 71 bytes per row, plus whatever a geom type takes (I’ll assume 16 bytes for illustration), making 87 bytes. A Postgres page is 8192 bytes. So you have approximately 90 rows per pages.

    Your query matches 3950815 out of 5563070 rows, or about 70% of the total. Assuming the data is randomly distributed, with regard to your where filters, there is a pretty much a 30% ^ 90 chance of finding a data page with no matching row. This is essentially nothing. So regardless of how good your indexes are, you’re still going to have to read all the data pages. If you’re going to have to read all the pages anyway, a table scan is usually a good approach.

    The one get out here, is that I said non-covering index. If you are prepared to create indexes that can answer queries in of themselves, you can avoid looking up the data pages at all, so you are back in the game. I’d suggest the following are worth looking at:

    flight_2012_09_12 (flightkey, departure, arrival)
    position_2012_09_12 (filghtkey, time, lon, ...)
    position_2012_09_12 (lon, time, flightkey, ...)
    position_2012_09_12 (time, long, flightkey, ...)
    

    The dots here represent the rest of the columns you are selecting. You’ll only need one of the indexes on position, but it’s hard to tell which will prove the best. The first approach may permit a merge join on presorted data, with the cost of reading the whole second index to do the filtering. The second and third will allow data to be prefiltered, but require a hash join. Give how much of the cost appears to be in the hash join, the merge join might well be a good option.

    As your query requires 52 of the 87 bytes per row, and indexes have overheads, you may not end up with the index taking much, if any, less space then the table itself.

    Another approach is to attack the “randomly distributed” side of it, by looking at clustering.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hey, I have 2 tables in PostgreSql: 1 - documents: id, title 2 -
That title is presumably awfully worded. I have some PostgreSQL tables. There is a
Background In a PostgreSQL 9.0 database, there are various tables that have many-to-many relationships.
I have the following tables in a PostgreSQL: Categories | Locations | Checkins |
I have a dump of several Postgresql Tables in a selfcontained CSV file which
I have several statements which access very large Postgresql tables i.e. with: SELECT a.id
I'm trying to use Doctrine with a poorly designed PostgreSQL database. Some tables have
I have mapped my tables in postgresql 9.0.1 to the spring application with hibernate
Suppose we have a PostgreSQL database with two tables A, B. table A columns:
Suppose you have two tables in PostgreSQL. Table A has field x , which

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.