POSTGRES-
I want update the Employees.zipcode_mod column in the Employees table for the ‘invalid zipcodes’ (Employees.zipcode) which are invalid if they do NOT EXIST in Ref_Zips.zip5
The update rule is to find all the invalid zipcodes that are 3 chars or long and match them on first three digits of Tmp_Agg_Zips.zip column and update Employees.zipcode_mod with Tmp_Agg_Zips.zip that has the highest number Tmp_Agg_Zips.emp_cnt. If there is a tie between multiple Tmp_Agg_Zips.zip values, then get the ‘highest’ zip value.
Update
If the invalid zipcode is over 3 chars but its first three digits do not match any of the first three digits of Tmp_Agg_Zips.zip OR invalid zipcode is less than 3 chars or null, then just update Employees.zipcode_mod with Tmp_Agg_Zips.zip that has the maximum value of Tmp_Agg_Zips.emp_cnt, irrespective of the first three digits. Ex- 88888 and null are updated to 10012 in the example below.
This is for Postgres 8.4.
Employees
Gender | zipcode | zipcode_mod
M | 99574 |
F | 99574 |
F | 10012 |
F | 10012 |
F | 10012 |
F | 19001 |
M | 100 | 10012
M | 190 | 19001
M | 19 | 10012
F | null | 10012
F | 88888 | 10012
F | 8888 | 10012
Tmp_Agg_Zips
zip | emp_cnt
99574 | 2
10012 | 3
19001 | 1
Ref_Zips
zip5
99574
10012
19001
For updated question
I added a
COALESCE()clause to catch the cases where no matching alternative is found. And put the computation of the default value into a subquery for multiple use.For original question
This query works with older versions of PostgreSQL:
In PostgreSQL 9.1, a CTE should perform better:
If there are multiple rows in
tmp_agg_zipswith the same (highest)emp_cnt, I pick the “lowest”zip. You did not specify how to break these ties.BTW, different column names for zip codes are not helpful for me. Table-qualifying the column names does a better job.