I’m playing around with Philadelphia transit data and I have an sqlite database storing the gtfs data. I have this query looking for departure times at a particular stop:
SELECT "stop_time".departure_time FROM "stop_time"
INNER JOIN "trip" ON "trip".trip_id = "stop_time".trip_id
WHERE
(trip.route_id = '10726' )
-- AND (trip.service_id = '1')
AND (stop_time.stop_id = '220')
AND (time( stop_time.departure_time ) > time('08:30:45'))
AND (time( stop_time.departure_time ) < time('09:30:45'));
The clause to match service_id to 1 is currently commented out. If I run the query as it is now, without matching service_id, it takes 2 seconds. If I uncomment the service_id clause, it’ll take 30. I’m clueless as to why since I’m already looking into the trip table for the route_id.
Any thoughts?
This generally happens if an index is defined on (route_id, stop_id and departure_time) that is used when there is no filter on service_id. Once you include that in the
WHEREclause, since it is not present in the index, it requires aTABLE SCANand hence the shoot up in the execution time. If you includeservice_idalso in the index definition, thenTABLE SCANwould be replaced withINDEX SCAN.