I feel very puzzled about the state, I found querying a table with
a where condition of a none index column with “limit 1” is very
fastT,though the column have no index。 The following is a example:
–1 create test table with 20000000 data
francs=> create table test_limit (id int4,name varchar(32));
CREATE TABLE
francs=> insert into test_limit select generate_series(1,20000000),generate_series(1,20000000) || 'a';
INSERT 0 20000000
francs=> \d test_limit;
Table "francs.test_limit"
Column | Type | Modifiers
--------+-----------------------+-----------
id | integer |
name | character varying(32) |
–2 query table
francs=> explain analyze select * from test_limit where id=1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.028..3162.477 rows=1 loops=1)
Filter: (id = 1)
Total runtime: 3162.531 ms
(3 rows)
Notice it takes about 3162 ms whihc is very slow as I expect。
–3 query table with “limit 1 ” cause
francs=> explain analyze select * from test_limit where id=1 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.019..0.019 rows=1 loops=1)
-> Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.017..0.017 rows=1 loops=1)
Filter: (id = 1)
Total runtime: 0.047 ms
(4 rows)
Notice it takes only about 0.047 ms ms,it is so fast, but the column id have no index。Any body can explain it ?
thanks a lot!
–4 addtion test
francs=> explain analyze select * from test_limit where id=2 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.023..0.023 rows=1 loops=1)
-> Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.022..0.022 rows=1 loops=1)
Filter: (id = 2)
Total runtime: 0.066 ms
(4 rows)
francs=> explain analyze select * from test_limit where id=3 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.022..0.022 rows=1 loops=1)
-> Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.021..0.021 rows=1 loops=1)
Filter: (id = 3)
Total runtime: 0.060 ms
(4 rows)
francs=> explain analyze select * from test_limit where id=101 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.035..0.036 rows=1 loops=1)
-> Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.033..0.033 rows=1 loops=1)
Filter: (id = 101)
Total runtime: 0.075 ms
(4 rows)
francs=> explain analyze select * from test_limit where id=1001 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.192..0.192 rows=1 loops=1)
-> Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=0.190..0.190 rows=1 loops=1)
Filter: (id = 1001)
Total runtime: 0.231 ms
(4 rows)
From the addtion test, we can see it’s also very fast.
–5 final test
francs=> explain analyze select * from test_limit where id=9999999 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..358111.05 rows=1 width=13) (actual time=1379.153..1379.154 rows=1 loops=1)
-> Seq Scan on test_limit (cost=0.00..358111.05 rows=1 width=13) (actual time=1379.151..1379.151 rows=1 loops=1)
Filter: (id = 9999999)
Total runtime: 1379.206 ms
(4 rows)
From the above ,I use a later id which is 9999999 , it’s slow now; I understand now,thanks!
Maybe “id=1” is very early in the table, so when it reads the table sequentially it will hit that row very quickly, and since you said “limit=1” it can just stop after the first result.
Alternatively, there could be some caching involved too.