I’m faced with the task of having to look in a database with millions

Question

0

Asked: June 8, 20262026-06-08T18:17:16+00:00 2026-06-08T18:17:16+00:00

I’m faced with the task of having to look in a database with millions

0

I’m faced with the task of having to look in a database with millions of records, which codes of a set of about 1500 have a corresponding record, which ones of those exist in the db. For example i have 1500 IDs in a csv file. I want to know which ones of those IDs exist in the database, and are therefore correct, and which ones don’t.

Is there a better way of doing this without “… WHERE id IN (1, 2, 3, ..., 1500);” ?
The DB/language in question is ORACLE PL/SQL.

Thanks in advance for any help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T18:17:17+00:00

Build an external table on your CSV file. These are highly neat things which allow us to query the contents of an OS file in SQL. Find out more.

Then it’s a simple matter of issuing a query:

select csv.id
       , case ( when tgt.id is null then 'invalid' else 'valid') end as valid_id
from your_external_tab csv
       left join target_table tgt on (csv.id = tgt.id)

“CSV table is hardly ideal from a performance point of view”

Performance is a matter of context. In this case it depends on how often the data in the CSV changes and how often we need to query it. If the file is produced once a day and we only need to check the values after it has been delivered then an external table is the most efficient solution. But if this data set is a permanent repository which needs to be queried often then the overhead of writing to a heap table is obviously justified.

To me, a CSV file consisting of a bunch IDs and nothing else sounds like transient data and so
fits the use case for external tables. But the OP may have additional requirements which they haven’t mentioned.

Here is an alternative approach which doesn’t require creating any permanent database objects. Consequently it is less elegant, and probably will perform worse.

It reads the CSV file labouriously using UTL_FILE and populates a collection based on SYSTEM.NUMBER_TBL_TYPE, a pre-defined collection (nested table of NUMBER) which should be available in your Oracle database.

declare
    ids system.number_tbl_type;
    fh utl_file.file_handle;
    idx pls_integer := 0;
    n pls_integer;
 begin
    fh := utl_file.fopen('your_data_directory', 'your_data.csv', 'r');
    begin
        utl_file.get_line(fh, n);
        loop  
            idx := idx + 1;
            ids.extend();
            ids(idx) := n;
            utl_file.get_line(fh, n);
        end loop;
   exception
      when no_data_found then
          if utl_file.is_open(fh) then
             utl_file.fclose(fh);
          end if;
     when others then
          raise;
  end;
  for id_recs in  in  ( select csv.column_value 
              , case ( when tgt.id is null then 'invalid' else 'valid') end as valid_id
                from (select * from table(ids)) csv
            left join target_table tgt on (csv.column_value = tgt.id)
  ) loop
  dbms_output.put_line '(ID '||id_recs.column_value || ' is '||id_recs.valid_id);
  end loop;
end;

Note: I have not tested this code. The principle is sound but the details may need debugging 😉

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m faced with the task of having to look in a database with millions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply