I am working on a project where I need to load the data into data warehouse using ETL process. I have data in csv, unstructured and flat file format. I am thinking about using shell scripting to carry out the ETL process. I know little about both bash shell and KornShell (ksh) but I am very new in ETL process. So my question is what is the better option for ETL process. Whether I should use Bash Shell or KornShell?
The answer from user experienced with ETL process and shell scripting is highly appreciated.
Thank in advance.
Typically, my ETL processes use SQL statements to do in-database transformation, so they’re really “ELT” process. The shell simply serves as the tool to move files around, execute data loads & extracts, and execute SQL statements. If your DW is on a sufficiently powerful system, it’s usually the best place to do the transformation work, unless you are set on having a system living outside of the EDW that does data transformations.
The choice of shell for such an ELT process that I’ve described is really one of maintenance. Who will be supporting this when you’re gone? Does the company have lots of folks who know bash, but only one that knows KSH? Or is it 99% a .NET shop? Then I’d suggest writing your ETL in little C# console apps. The choice of the language you use to execute your ETL, when you’re not using a real “ETL” tool, should be focused on these factors, not on the ‘best’ language.