Following up on my previous post Link here, the other challenge we are now facing is find the best match for an Address field [ADDR_LINE_1, ADDR_LINE_2, CITY, STATE, ZIP].
We want to return all the records in the database that would be a possible match to an incoming Address record[from the file]. The scenario is:
Following are 2 records in the database;
ADDR_LINE_1, ADDR_LINE_2 , CITY , STATE, ZIP
001 Chestnut Avenue, Apt 100 , Indiana , IN , 9999
Apt 100 , 001 Chestnut Ave., Indianapolis, IN , 9999
For an incoming record, as follows
ADDR_LINE_1, ADDR_LINE_2, CITY , STATE, ZIP
1 Chestnut Avenue, Apt 100 , Indiana , IN , 9999
I want to detect the record as an existing record and list the both of the above possible matches.
[Note:] The order of the database entries are interchanged, but still should be listed as a possible match.
Can anyone please provide suggestions as to how I can go about it?
Depending on the Oracle version, you may be able to use the UTL_MATCH package to generate a similarity score and then fiddle with what threshold score seems reasonable to you. For example, there is a 96% similarity between the string ‘001 Chestnut Avenue’ and the string ‘1 Chestnut Avenue’ using the Jaro-Winkler algorithm
Obviously, you’d likely need to do some work to identify what weighting to give various fields– presumably, for example, you’d have a higher threshold to match on the city which is likely to be relatively standardized than on the second line of the address.