I’m faced with the task of having to look in a database with millions of records, which codes of a set of about 1500 have a corresponding record, which ones of those exist in the db. For example i have 1500 IDs in a csv file. I want to know which ones of those IDs exist in the database, and are therefore correct, and which ones don’t.
Is there a better way of doing this without “… WHERE id IN (1, 2, 3, ..., 1500);” ?
The DB/language in question is ORACLE PL/SQL.
Thanks in advance for any help.
Build an external table on your CSV file. These are highly neat things which allow us to query the contents of an OS file in SQL. Find out more.
Then it’s a simple matter of issuing a query:
Performance is a matter of context. In this case it depends on how often the data in the CSV changes and how often we need to query it. If the file is produced once a day and we only need to check the values after it has been delivered then an external table is the most efficient solution. But if this data set is a permanent repository which needs to be queried often then the overhead of writing to a heap table is obviously justified.
To me, a CSV file consisting of a bunch IDs and nothing else sounds like transient data and so
fits the use case for external tables. But the OP may have additional requirements which they haven’t mentioned.
Here is an alternative approach which doesn’t require creating any permanent database objects. Consequently it is less elegant, and probably will perform worse.
It reads the CSV file labouriously using UTL_FILE and populates a collection based on SYSTEM.NUMBER_TBL_TYPE, a pre-defined collection (nested table of NUMBER) which should be available in your Oracle database.
Note: I have not tested this code. The principle is sound but the details may need debugging 😉