I have two tables:
PERSONwith columnsperson_idandtotalDATAwith columnsdata_a,data_b,data_c, anddata_person_id
Each “person” can have zero or more entries in DATA – your standard one-to-many relationship. PERSON has a total column that is the sum of values in DATA. There are currently some discrepancies between total and the actual entries in DATA where DATA is correct but total is wrong.
This is the query I’m using to find the discrepancies:
SELECT
person_id
FROM PERSON JOIN (
SELECT
data_person_id,
SUM( data_a + data_b + data_c ) as data_total
FROM
DATA
GROUP BY
data_person_id
) x ON data_person_id = person_id
WHERE
person_total != data_total
I plan on doing this through Hibernate as a query where the backend will be Postgres 9.x.
The incorrect query that I’m trying to understand/fix is:
UPDATE
ONLY PERSON
SET
total = data_info.calc_total
FROM (
SELECT
SUM( data_a + data_b + data_c ) as calc_total
FROM
DATA
WHERE
DATA.data_person_id = person_id
GROUP BY
DATA.data_person_id
) as data_info
WHERE
PERSON.person_id IN (
SELECT
data_person_id
FROM PERSON JOIN (
SELECT
data_person_id,
SUM( data_a + data_b + data_c ) as data_total
FROM
DATA
GROUP BY
data_person_id
) x ON person_id = data_person_id
WHERE
total != data_total
)
Right now, it won’t run because of WHERE DATA.data_person_id = person_id. But if I take that out, the wrong values get used.
The following seems to work but I’m confused as to why:
UPDATE
ONLY PERSON
SET
total = data_info.calc_total
FROM
PERSON P JOIN (
SELECT
data_person_id,
SUM( data_a + data_b + data_c ) as calc_total
FROM
DATA
WHERE
DATA.data_person_id = person_id
GROUP BY
DATA.data_person_id
) as data_info ON P.person_id = data_person_id
WHERE
PERSON.person_id IN (
SELECT
data_person_id
FROM PERSON JOIN (
SELECT
data_person_id,
SUM( data_a + data_b + data_c ) as data_total
FROM
DATA
GROUP BY
data_person_id
) x ON person_id = data_person_id
WHERE
total != data_total
)
I believe my problem lies in my misunderstanding of the doc (I’m guessing the part about the self-join).
Also, any ways to improve this query is appreciated!
Seems your queries are way to complex. The task should be as simple as:
First aggregate
calc_totalfrom thedatatable, grouped bydata_person_id.Then use this subquery in the FROM clause of the
UPDATE.I use
IS DISTINCT FROMto make sure NULL values are covered, while only rows that would change are actually updated.If all involved columns are defined
NOT NULL, you can use=instead.-> sqlfiddle demo.