Here is the query:
SELECT name, SUM( `count` ) AS Total
FROM `identdb`
WHERE MBRCONTAINS( GEOMFROMTEXT( 'LineString(34.4 -119.9, 34.5 -119.8)' ) , latlng )
AND MOD( DAYOFYEAR( CURDATE( ) ) - DAYOFYEAR( `date` ) +365, 365 ) <=14
OR MOD( DAYOFYEAR( `date` ) - DAYOFYEAR( CURDATE( ) ) +365, 365 ) <=14
AND MBRCONTAINS( GEOMFROMTEXT( 'LineString(34.4 -119.9, 34.5 -119.8)' ) , latlng )
GROUP BY `name`
It essentially finds any rows where the day of year is plus or minus 14 of today’s day, and rows that the latlng spatial column is in the rectangle.
Here is what my database looks like:
# Column Type Collation
1 name varchar(66) utf8_general_ci
2 count tinyint(3)
3 date date
4 latlng geometry
5 lat1 varchar(15) latin1_swedish_ci
6 long1 varchar(15) latin1_swedish_ci
Keyname Type Unique Packed Column Cardinality Collation Null Comment
PRIMARY BTREE Yes No name 0 A
count 0 A
date 0 A
lat1 0 A
long1 6976936 A
sp_index SPATIAL No No latlng (32) 0 A
There are 7 million records and the query is taking about 7 seconds. I have no clue how to speed this up, thanks in advance!
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE identdb ALL sp_index NULL NULL NULL 6976936 Using where; Using temporary; Using filesort
UPDATED explanation of query:
I believe MBRCONTAINS creates a rectangle where I can compare whether the latlng spatial point is inside or not. The date part is finding dayofyear + or – 14 days. It is using modular arithmetic so that it won’t mess up around the new years. I had to put the MBRCONTAINS part in twice because of the use of OR.
My needs of the query are to find find all names that have a day of the year + or – 14 days, and are within the given lat/long pairs, and then total the counts for each.
I’m dumb at this stuff so please correct me if I’m doing something dumb. Thanks guys!
Rewrite it so that your calculations happen once per query, rather than once per row by expressing your predicates such that the column is not part of the calculation.
For example, this expression:
which requires 7 millions calculations on
date, can be expressed aswhich requires only 1 calculation and further would allow an index on the
datecolumn to be used.That change alone will speed up your query.
if you don’t have an index on date, put one and your query will fly:
I don’t know what
MBRCONTAINSdoes, but try to refactor it too so that the column value is not part of the calculation.