I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection (quality not yet determined but it’s not going to be super fast) to an internal corporate environment before some BI work is performed. This should just be change-sets making their way down each night.
I’m interested in thoughts on the ETL mechanisms people have successfully used in similar scenarios before. SSIS seems like a potential candidate; can anyone comment on the suitability for this scenario? Alternatively, other thoughts on how to do this in a cost-conscious way would be most appreciated. Thanks!
It depends on the use you have of the data received from the external machine.
If you must have the data for the calculations of the morning after or do not have confidence in your network, you would prefer to loose-couple the two systems and enable some message-queuing between them so that if something fails during the night like the DBs, the networks links, anything that would be a pain for you to recover, you can start every morning with some data.
If the data retrieval is not subject to a high degree of criticality, any solution is good 🙂
Regarding SSIS, it’s just a great ETL framework (yes, there’s a subtlety :)). But I don’t see it as a part of the data transfer, rather in the ETL part when your data has been received or is still waiting in the message-queing system.