We are designing a fairly large brownfield application, and run into a bit of a issue.
We have a fairly large amount of information in a DB2 database from a legacy application that is still loading data. We also have information in an Oracle database that we control.
We have to do a ‘JOIN’ type of operation on the tables. Right now, I was thinking of pulling the information out of the DB2 table into a List<> and then iterating those into a SQL statement on the Oracle database such as:
select * from accounts where accountnum in (...)
Is there any easier way to interact between the databases, or at least, what is the best practice for this sort of action?
I’ve done this two ways.
With two Sybase databases on different boxes, I set up store procedures, and called then like functions, to send data back and forth. This additionally allowed the sprocs to audit/log, to convince the customer no data was being lost in the process.
On an Oracle to Sybase one way, I used a view to marshall the data and each vendors’ C libraries called from a C++ program that gave the C APIs a common interface.
On a MySQL and DB2 setup, where like your situation, the Db2 was “legacy but live”, I employed a setup similar to what you’re describing: pulling the data out into a (Java) client program.
If the join is always one-to-one, and each box’s resultset has the same key, you can pull them both with the same ordering and trivially connect them in the client. Even if they’re one-to-many, stitching them together is just a one-way iteration of both of your lists.
If it gets to be many-to-many, then I might fall back to processing one item at a time (though you could use HashSet lookup).
Basically, though, your choices are sprocs (for which you’d need to and a client layer), or just doing it in the client.