Having spent some time working on data warehousing, I have created both ETL (extract transform load) and ELT (extract load transform) processes. It seems that ELT is a newer approach to populating data warehouses that can more easily take advantage of cluster computing resources. I would like to hear what other people think the advantages are of ETL and ELT over each other and when you should use one or the other.
Having spent some time working on data warehousing, I have created both ETL (extract
Share
So after having played thoroughly with both ETL and ELT, I have come to the conclusion that you should avoid ELT at all costs. ETL prepares the data for your warehouse before you actually load it in. ELT however loads the raw data into the warehouse and you transform it in place. That is problematic if you have a busy data warehouse. If there is a reporting query running on a table that you are attempt to update, your query will get blocked. Consequently, it is possible for reporting queries to hold up or block updates.
Now some of you might say reporting queries do not need to block an update and you can set your isolation level to allow for dirty reads. Reporting queries however are not generally executed by software engineers. They are executed by business users so you can’t rely on them to set their isolation levels properly. As well, not all reports can tolerate dirty reads.
There are cases where ELT can work however by introducing it to your data warehouse is dangerous and consequently, I recommend for your sanity and for maintainability, avoid it.