Using PostgreSQL 8.4 and with a table such as this:
create table log (
id bigint primary key,
first_sn bigint not null,
last_sn bigint not null
);
where first_sn and last_sn represent a range of serial numbers, and the table holding > 1 million rows, what sort of index and query should I use if I want to search for all rows where the serial number range contain an element in a list of serial numbers.
For example, for a list [5348491, 1230505, 5882233] I’m currently doing:
select 5348491, *
from log
where 5348491 between first_sn and last_sn
union
select 1230505, *
from log
where 1230505 between first_sn and last_sn
union
select 5882233, *
from log
where 5882233 between first_sn and last_sn;
But that’s a bit slow.
Edit: A query like that will take around 600ms and I would like to be able to search with a list of >10k serial numbers.
Since someone requested it, here’s the real table, query and an explain analyze (I hesitated since all the column names are in spanish, but in the previous example ‘id’ would be ‘movimiento_id’ here, ‘first_sn’ would be ‘serial_inicial’ , and ‘last_sn’ would be ‘serial_final’. ‘tipo_movimiento’ is the type of event and really it’s just a way to filter the resultset further):
CREATE TABLE movimiento
(
movimiento_id bigserial NOT NULL,
serial_inicial bigint NOT NULL,
serial_final bigint NOT NULL,
serial_chip bigint,
numero_telefono text,
fecha_movimiento timestamp without time zone DEFAULT now(),
producto_id integer NOT NULL,
usuario_id integer NOT NULL,
factura_proveedor text,
fecha_ingreso date,
fecha_venta date,
vendedor_id integer,
cliente_id integer,
tipo_movimiento text NOT NULL,
costo numeric(12,4),
precio numeric(10,2),
descuento double precision,
bodega_id integer NOT NULL DEFAULT 1,
fecha_activo timestamp without time zone,
factura text,
envio text,
documento text,
bodega_id_origen integer,
fecha date,
traslado_id integer,
detalle_factura_id bigint,
es_venta boolean DEFAULT false,
CONSTRAINT movimiento_pkey PRIMARY KEY (movimiento_id ),
CONSTRAINT movimiento_bodega_id_fkey FOREIGN KEY (bodega_id)
REFERENCES bodega (bodega_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT movimiento_bodega_id_origen_fkey FOREIGN KEY (bodega_id_origen)
REFERENCES bodega (bodega_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT movimiento_cliente_id_fkey FOREIGN KEY (cliente_id)
REFERENCES cliente (cliente_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT movimiento_producto_id_fkey FOREIGN KEY (producto_id)
REFERENCES producto (producto_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT movimiento_usuario_id_fkey FOREIGN KEY (usuario_id)
REFERENCES usuario (usuario_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT movimiento_vendedor_id_fkey FOREIGN KEY (vendedor_id)
REFERENCES vendedor (vendedor_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT movimiento_check CHECK (serial_final >= serial_inicial),
CONSTRAINT movimiento_costo_check CHECK (costo >= 0::numeric),
CONSTRAINT movimiento_descuento_check CHECK (descuento >= 0::double precision),
CONSTRAINT movimiento_precio_check CHECK (precio >= 0::numeric),
CONSTRAINT movimiento_tipo_movimiento_check CHECK (tipo_movimiento = ANY (ARRAY['Ingresado'::text, 'Vendido'::text, 'Entregado'::text, 'Regresado'::text, 'Eliminado'::text, 'Devuelto'::text, 'Inconforme'::text, 'Trasladado'::text, 'Consignado'::text, 'Devolucion Consignado'::text, 'Activado'::text, 'Devolucion Claro'::text, 'Asignado'::text, 'Fusion-Sale'::text, 'Fusion'::text, 'Separacion-Sale'::text, 'Separacion'::text]))
)
WITH (
OIDS=TRUE
);
Here’s the query:
explain analyze select 869461009867643, *
from movimiento
where (869461009867643 between serial_inicial and serial_final)
and tipo_movimiento = 'Ingresado'
union all
select 12121001477546, *
from movimiento
where 12121001477546 between serial_inicial and serial_final
and tipo_movimiento = 'Ingresado'
union all
select 354689040208615, *
from movimiento
where 354689040208615 between serial_inicial and serial_final
and tipo_movimiento = 'Ingresado';
And the explain analyze:
Append (cost=7542.94..185580.33 rows=232322 width=165) (actual time=93.222..571.928 rows=4 loops=1)
-> Bitmap Heap Scan on movimiento (cost=7542.94..61089.00 rows=90645 width=165) (actual time=93.220..206.248 rows=1 loops=1)
Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
Filter: ((869461009867643::bigint >= serial_inicial) AND (869461009867643::bigint <= serial_final))
-> Bitmap Index Scan on tipo_movimiento_index (cost=0.00..7520.28 rows=375432 width=0) (actual time=66.445..66.445 rows=372409 loops=1)
Index Cond: (tipo_movimiento = 'Ingresado'::text)
-> Bitmap Heap Scan on movimiento (cost=7534.24..61080.30 rows=55815 width=165) (actual time=84.364..179.571 rows=2 loops=1)
Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
Filter: ((12121001477546::bigint >= serial_inicial) AND (12121001477546::bigint <= serial_final))
-> Bitmap Index Scan on tipo_movimiento_index (cost=0.00..7520.28 rows=375432 width=0) (actual time=60.282..60.282 rows=372409 loops=1)
Index Cond: (tipo_movimiento = 'Ingresado'::text)
-> Bitmap Heap Scan on movimiento (cost=7541.75..61087.81 rows=85862 width=165) (actual time=173.876..186.082 rows=1 loops=1)
Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
Filter: ((354689040208615::bigint >= serial_inicial) AND (354689040208615::bigint <= serial_final))
-> Bitmap Index Scan on tipo_movimiento_index (cost=0.00..7520.28 rows=375432 width=0) (actual time=60.294..60.294 rows=372409 loops=1)
Index Cond: (tipo_movimiento = 'Ingresado'::text)
Total runtime: 572.138 ms
Here’s the explain analyze with a_horse_with_no_name’s example:
Nested Loop (cost=7614.18..98703.44 rows=125144 width=173) (actual time=629.373..2919.334 rows=4 loops=1)
Join Filter: ((lista.serie >= movimiento.serial_inicial) AND (lista.serie <= movimiento.serial_final))
CTE lista
-> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3 width=8) (actual time=0.012..0.033 rows=3 loops=1)
-> Bitmap Heap Scan on movimiento (cost=7614.14..59283.04 rows=375432 width=165) (actual time=110.909..460.563 rows=372409 loops=1)
Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
-> Bitmap Index Scan on tipo_movimiento_index (cost=0.00..7520.28 rows=375432 width=0) (actual time=107.182..107.182 rows=372409 loops=1)
Index Cond: (tipo_movimiento = 'Ingresado'::text)
-> CTE Scan on lista (cost=0.00..0.06 rows=3 width=8) (actual time=0.001..0.003 rows=3 loops=372409)
Total runtime: 2919.514 ms
So combining a_horse_with_no_name and Craig Ringer’s suggestions, searching for three serial numbers ran under 350ms. Tried with 10k and it did that in 3s+:
create temporary table lista (
serie bigint
) on commit drop;
create index lista_index on lista using btree (serie);
insert into lista (select distinct serial_inicial from movimiento limit 10000);
analyze lista;
select serie, movimiento.*
from movimiento join lista on serie between serial_inicial and serial_final
where tipo_movimiento = 'Ingresado';
If you don’t really need the information which of the supplied values matched, you can use a simple OR:
Another option would be this:
Although I don’t think any of those solutions would actually scale to 10k values to compare against.
(I assume you do have an index on both sn columns)