Let’s say I have an Oracle database table of 40MM 9-digit ZIP codes which includes the 4-digit ZIP code for each. I have a list of 800 5-digit ZIP codes and I need to find all the associated 9-digit ZIP codes. There are 40K 5-digit ZIP codes altogether. Assume we have indexed the 5-digit ZIP code field with a B*Tree (conventional) index. With database performance and scalability in mind, is the best method to:
-
Query for all the records in a single statement? Something like this run once:
SELECT * FROM ZIPCODE WHERE ZIP5 IN (:1, :2,… :800)
-
Query once for each of the 5-digit ZIP codes? Something like this run 800 times:
SELECT * FROM ZIPCODE WHERE ZIP5 = :1
-
Some batch size in between the two?
What do you think, and why? Some [optional] follow-up thoughts if you’re in a thinking mood:
- What tests can we run to validate your thinking?
- Does the best method change as the volumes change (e.g., if we have 100 9-digit ZIP codes mapped to 10 5-digit ZIP codes, looking for 3 of those 10)?
- Is the best thing for the database different than the best thing for the app servers which are absorbing the query results?
Let the database server decide how to handle it. Even if it internally does a query 800 times, it will still be faster. It has to parse the query only once, and has to send results only once.
Thus, use
SELECT * FROM ZIPCODE WHERE ZIP5 IN (:1, :2,... :800)