Hey guys. This is a follow-on from this question:
After getting the right data and making some tweaks based on requests from business, I’ve now got this mini-beast on my hands. This query should return the total number of new jobseeker registrations and the number of new uploaded CV’s:
SELECT COUNT(j.jobseeker_id) as new_registrations,
(
SELECT
COUNT(c.cv_id)
FROM
tb_cv as c, tb_jobseeker, tb_industry
WHERE
UNIX_TIMESTAMP(c.created_at) >= '1241125200'
AND
UNIX_TIMESTAMP(c.created_at) <= '1243717200'
AND
tb_jobseeker.industry_id = tb_industry.industry_id
)
AS uploaded_cvs
FROM
tb_jobseeker as j, tb_industry as i
WHERE
j.created_at BETWEEN '2009-05-01' AND '2009-05-31'
AND
i.industry_id = j.industry_id
GROUP BY i.description, MONTH(j.created_at)
Notes:
– The two values in the UNIX TIMESTAMP functions are passed in as parameters from the report module in our backend.
Every time I run it, MySQL chokes and lingers silently into the ether of the Interweb.
Help is appreciated.
Update: Hey guys. Thanks a lot for all the thoughtful and helpful comments. I’m only 2 weeks into my role here, so I’m still learning the schema. So, this query is somewhere between a thumbsuck and an educated guess. Will start to answer all your questions now.
tb_cv is not connected to the other tables in the sub-query. I guess this is the root cause for the slow query. It causes generation of a Cartesian product, yielding a lot more rows than you probably need.
Other than that I’d say you need indexes on
tb_jobseeker.created_at,tb_cv.created_atandtb_industry.industry_id, and you might want to get rid of theUNIX_TIMESTAMP()calls in the sub-query since they prevent use of an index. UseBETWEENand the actual field values instead.Here is my attempt at understanding your query and writing a better version. I guess you want to get the count of new jobseeker registrations and new uploaded CVs per month per industry:
A few things I noticed while writing the query:
COUNTinstead.)