I was asked to prototype two ETL frameworks. The requirements are as follows:
- Open Source
- Available to Linux
- Maintained
- Logs can be viewed on web browser (nice to have)
- Written in Perl, Python, Ruby or Java
The raw file can be anything (excel, csv, html page etc..)
The target database is MySQL.
Dont just drop names, please indicate the advantages/disadvantages based from your experience.
Thanks!
I’ve used Kettle. It has its own GUI, but if you rather use the API to do the ETL yourself it’s also supported. It has proved to be very useful to me and there are a few plugins already available for it.