I have a reasonably complex MySQL query being run on another developer’s database. I am trying to copy over his data to our new database structure, so I’m running this query to get a load of the data over to copy. The main table has around 45,000 rows.
As you can see from the query below, there’s a lot of fields from several different tables. My problem is that the field Ref.refno (as ref_id) is being pulled through, in some cases, two or three times. This is because in the table LandlordOnlineRef (LLRef) there are sometimes multiple rows with this same reference number – in this case, because the row should have been edited, but instead was duplicated…
Here’s what I’ve tried doing: –
SELECT DISTINCT(Ref.refno) [...]– this makes no difference to the output at all, although I would’ve assumed it would stop selecting duplicaterefnoIDs- Is this a MySQL bug, or me? – I also tried adding
GROUP BY ref_idto the end of my query. The query normally takes a few milliseconds to run, but when I addGROUP BYto the end, it seems to run infinitely – I waited several minutes but nothing was happening. I thought it might be struggling because I’m usingLIMIT 1000, so I also triedLIMIT 10but still get the same effect.
Here’s the problem query – thanks!
SELECT
-- progress
Ref.refno AS ref_id,
Ref.tenantid AS tenant_id,
Ref.productid AS product_id,
Ref.guarantorid AS guarantor_id,
Ref.agentid AS agent_id,
Ref.companyid AS company_id,
Ref.status AS status,
Ref.startdate AS ref_start_date,
Ref.enddate AS ref_end_date,
-- ReferenceDetails
RefDetails.creditscore AS credit_score,
-- LandlordOnlineRef
LLRef.propaddress AS prev_ll_address,
LLRef.rent AS prev_ll_rent,
LLRef.startdate AS prev_ll_start_date,
LLRef.enddate AS prev_ll_end_date,
LLRef.arrears AS prev_ll_arrears,
LLRef.arrearsreason AS prev_ll_arrears_reason,
LLRef.propertycondition AS prev_ll_property_condition,
LLRef.conditionreason AS prev_ll_condition_reason,
LLRef.consideragain AS prev_ll_consider_again,
LLRef.completedby AS prev_ll_completed_by,
LLRef.contactno AS prev_ll_contact_no,
LLRef.landlordagent AS prev_ll_or_agent,
-- EmpDetails
EmpRef.cempname AS emp_name,
EmpRef.cempadd1 AS emp_address_1,
EmpRef.cempadd2 AS emp_address_2,
EmpRef.cemptown AS emp_address_town,
EmpRef.cempcounty AS emp_address_county,
EmpRef.cemppostcode AS emp_address_postcode,
EmpRef.ctelephone AS emp_telephone,
EmpRef.cemail AS emp_email,
EmpRef.ccontact AS emp_contact,
EmpRef.cgross AS emp_income,
EmpRef.cyears AS emp_years,
EmpRef.cmonths AS emp_months,
EmpRef.cposition AS emp_position,
-- EmpLlodReference
ELRef.lod_ref_status AS prev_ll_status,
ELRef.lod_ref_email AS prev_ll_email,
ELRef.lod_ref_tele AS prev_ll_telephone,
ELRef.emp_ref_status AS emp_status,
ELRef.emp_ref_tele AS emp_telephone,
ELRef.emp_ref_email AS emp_email
FROM ReferenceDetails AS RefDetails
LEFT JOIN progress AS Ref ON Ref.refno
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
LEFT JOIN EmpLlodReference AS ELRef ON ELRef.refno = Ref.refno
LEFT JOIN EmpDetails AS EmpRef ON EmpRef.tenantid = Ref.tenantid
-- For testing purposes to speed things up, limit it to 1000 rows
LIMIT 1000
is going to basically turn that into a cartesian join. You’re not doing an explicit comparison, you’re saying “join all records where there’s a non-null value”.
Shouldn’t it be
?