I’m trying to speed this query up as much as possible. It’s not terribly slow, but I need it be as fast as can be.
SELECT name
FROM (
SELECT cities.name || ', ' || regions.name || ', ' || countries.code AS name
FROM cities
INNER JOIN regions ON regions.id = cities.region_id
INNER JOIN countries ON countries.id = regions.country_id
) AS t1
GROUP BY name
HAVING LOWER(name) ILIKE 'asheville%'
ORDER BY name ASC
LIMIT 10;
These indexes exist:
UNIQUE INDEX index_cities_on_name_and_region_id ON cities USING btree (name, region_id)
UNIQUE INDEX index_countries_on_code ON countries USING btree (code)
UNIQUE INDEX index_countries_on_name ON countries USING btree (name)
UNIQUE INDEX index_regions_on_code_and_country_id ON regions USING btree (code, country_id)
cities table contains 248016 records.
countries table contains 252 records.
regions table contains 4005 records.
Here’s the explain output of the query: http://explain.depesz.com/s/fWe
Any help would be greatly appreciated. Basically I’m just looking for suggestions or maybe pointing out something I may have missed.
In your subquery, you should return both the
nameyou are already returning, andcities.name as cname. Then, you should do yourilikeoncnameinstead of onname. The issue is that right now there is no way that PostgreSQL could really be expected to infer that as'ashville%'doesn’t have any commas in it, it can just look at the city name in the subquery, so it is really having to (and is, based on your explain) iterate and build every single possible string in order to do that final filter. If you returncities.nameback to the upper query it will dramatically improve the performance, as right now it seriously can’t use any of those indexes you have.Really, you should go all the way here, and just remove the string concatenation inside of the query and return what you really intended:
select cities.name as city, regions.name as region, countries.code as country, and modify sort toorder by t1.city, t1.region, t1.country.Additionally, are you really asking for cities that have
'ashville%', or is this just an indirect way of looking for cities that are'ashville', but you have to deal with the comma delineation internally? Then, outside, uselower(t1.city) = 'ashville'(note the=:lower(x) ilike 'lower'is pointlessly slow).Also, you need to fix these indexes: what you really want is
create index whatever on cities((lower(name))), as that’s what you are actually searching by, notname: there is no way that you can ever get these indexes to be used if you are searching for something that is unrelated to what you have in the index.(You might look at the
order by namelater and be concerned that it won’t be accelerated anymore, but that’s ok: the goal here is to rapidly filter from the tons of possible locations down to the tiny set of ones that you are going to operate on; what’s left over can be rapidly sorted in memory, as you are probably dealing with 10-20 results.)Due to this, as
regions.idandcountries.idare probablyprimary keys, the other indexes can be deleted if they are only there for this query.Finally, flatten the query to a single level, remove the
group by, and replace it with adistinct. The issue is that we want to make certain that we don’t force PostgreSQL to generate the full set before attempting the filter: we want to make certain it has enough knowledge of the goal to be able to use the city index to rapidly scan directly to the cities that can match, and then get around to filling in the region and country information.(PostgreSQL is generally very very good at doing this, even through a sub-query, but as we have a
group byclause through ahaving, I can see situations where it will no longer be able to infer.)(edit) Actually, wait: you have a unique index on
cities (name, region_id), so you don’t even need thedistinct… all it is doing is making the query pointlessly more complex. I just went ahead and removed it from the query: the result will be the same, as you cannot possibly end up with a result where you have the same city in the same region/country being returned twice.(edit) If, by the way, you actually are meaning to do a prefix match, then you will want to change the
= 'asheville'back to alike 'ashevill%'(note thelike: noi), and change the index to be specified as follows: