I have a requirement where I have to upload a file to db.
File will have approx 100K records daily and one per month 8 to 10 million records.
Also there are some field level validations to be performed.
validations are like: are all fields present, do number field contains valid number, date contains valid date, is number in specified range, do the string format match, etc.
There are 3 ways.
1: Upload to temp and then validate
- Create a temp table (all string columns), have extra error column
- upload all entries to temp table
- run validation, populate error column if needed
- move valid entries to correct table
Cons: entries has to be written twice in db, even correct ones.
2: Upload to db directly
- upload all entries directly to table
- check which entries are not uploaded
Cons: would need to read each line even after upload, so as good as double read
3: Validate and then Upload
- read each line, run all validations on all columns
- if valid then write to db
Cons: file reading must be slow than bulk upload to db.
I am writing app in: C# & ASP.NET, DB is Oracle.
Which one of 3 ways is best?
As @aF says, option 2, with the following addition:
Add a table that you can dump ‘invalid’ rows into. Then, run a statement like this:
then dump ‘validated’ rows into your actual table, excluding ‘invalid’ rows:
The
INSERTwill fail if any (other) ‘invalid’ data is encountered, but should be discoverable. The ‘invalid’ table can then be worked to be cleaned up and re-inserted.