I need some options.
I have a table layed out as follows with about 78,000,000 rows…
- id INT (Primary Key)
- loc VARCHAR (Indexed)
- date VARCHAR (Indexed)
- time VARCHAR
- ip VARCHAR
- lookup VARCHAR
Here is an example of a query I have.
SELECT lookup, date, time, count(lookup) as count FROM dnstable
WHERE STR_TO_DATE(`date`, '%d-%b-%Y') >= '$date1' AND STR_TO_DATE(`date`, '%d-%b-%Y') <= '$date2' AND
time >= '$hour1%' AND time <= '$hour2%' AND
`loc` LIKE '%$prov%' AND
lookup REGEXP 'ca|com|org|net' AND
lookup NOT LIKE '%.arpa' AND
lookup NOT LIKE '%domain.ca' AND
ip NOT LIKE '192.168.2.1' AND
ip NOT LIKE '192.168.2.2' AND
ip NOT LIKE '192.168.2.3'
GROUP BY lookup
ORDER BY count DESC
LIMIT 100
I have my mysql server configured like a few high useage examples I found. The hardware is good, 4 cores, 8 gig rams.
This query takes about 180 seconds… Does anyone have some tips on making this more efficent?
There are a lot of things wrong here. A LOT of things. I would look to the other answers for query options (you use a lot of LIKES, NOT LIKES, and functions….and you’re doing them on unkeyed columns…). If I were in your case, I’d redesign my entire database. It looks as though you’re using this to store DNS entries – host names to IP addresses.
You may not have the option to redesign your database – maybe it’s a customer database or something that you don’t have control over. Maybe they have a lot of applications which depend on the current database design. However, if you can refactor your database, I would strongly suggest it.
Here’s a basic rundown of what I’d do:
Store the TLDs (top-level-domains) in a separate column as an ENUM. Make it an index, so it’s easily searchable, instead of trying to regex .com, .arpa, etc. TLDs are limited anyway, and they don’t change often, so this is a great candidate for an ENUM.
Store the domain without the TLD in a regular column and a reversed column. You could index both columns, but depending on your searches, you may only need to index the reverse column. Basically, having a reverse column allows you to search for all hosts in one domain (ex. google) without having to do a fulltext search each time. MySQL can do a key search on the string “elgoog” in the reverse column. Because DNS is a hierarchy, this fits perfectly.
Change the date and time columns from VARCHAR to DATE and TIME, respectively. This one’s an obvious change. No more str_to_time, str_to_date, etc. Absolutely no point in doing that.
Store the IP addresses differently. There’s no reason to use a VARCHAR here – it’s inefficient and doesn’t make sense. Instead, use four separate columns for each octet (this is safe because all IPv4 addresses have four octets, no more, no less) as unsigned TINYINT values. This will give you 0-255, the range you need. (Each IP octet is actually 8 bits, anyway.) This should make searches much faster, especially if you key the columns.
ex: select * from table where octet1 != 10; (this would filter out all 10.0.0.0/8 private IP space)
The basic problem here is that your database design is flawed – and your query is using columns that aren’t indexed, and your queries are inefficient.
If you’re stuck with the current design….I’m not sure if I can really help you. I’m sorry.