Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3451238
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T09:07:30+00:00 2026-05-18T09:07:30+00:00

Looking for a non-cloud based open source app for doing data transformation; though for

  • 0

Looking for a non-cloud based open source app for doing data transformation; though for a killer (and I mean killer) app just built for data transformations, I might be willing to spend up to $1000.

I’ve looked at Perl, Kapow Katalyst, Pentaho Kettle, and more.

Perl, Python, Ruby which are clearly languages, but unable to find any frameworks/DSLs just for processing data; meaning they’re really not a great development environments, meaning there’s no built GUI’s for building RegEx, Input/Output (CSV, XML, JDBC, REST, etc.), no debugger for testing rows and rows of data — they’re not bad either, just not what I’m looking for, which is a GUI built for complex data transformations; that said, I’d love if the GUI/app file was in a scripting language, and NOT just stored in some not human readable XML/ASCII file.

Kapow Katalyst is made for accessing data via HTTP (HTML, CSS, RSS, JavaScript, etc.) it’s got a nice GUI for transforming unstructured text, but that’s not its core value offering, and is way, way too expensive. It does an okay job of traversing document namespace paths; guessing it’s just XPath on the back-end, since the syntax appears to be the same.

Pentaho Kettle has a nice GUI for INPUT/OUTPUT of most common data stores, and its own take on handling data processing; which is okay, and just has a small learning curve. Kettle’s debugger is ok, in that the data is easy to see, but the errors and exceptions are not threaded with the output, and there no way to really debug an issue; meaning you can’t reload the output/error/exception, but are able to view the system feedback. All that said, Kettle data transformation is _______ well, let’s just say it left me feeling like I must be missing something, because I was completely puzzled by “if it’s not possible, just write the transformation in JavaScript”; umm, what?

So, any suggestions? Do realize that I haven’t really spec’d out any transformations, but figure if you really use a product for data munging, I’d like to know about it; even excel, I guess.

In general though, currently I’m looking for a product that’s able to handle 1000-100,000 rows with 10-100 columns. It’d be super cool if it could profile data sets, which is a feature Kettle sort of does, but not super well. I’d also like built in unit testing, meaning I’m able to build out control sets of data, and run changes made against the control set. Then I’d like to be able to selectively filter out rows and columns as I build out the transformation without altering the build; for example, I run a data set through transformation, filter the results, and the next run those sets are automatically blocked at the first “logical” occurrence; which in turn would mean less data to “look at” and a reduced runtime per each enhanced iteration; what would be crazy nice is if as I’d filtering out the rows/columns the app is tracking those, (and the output was filtered out). and unit tested/highlighted any changes. If I made a change that would effect the application logs and it’s ability to track the unit tests based on me “breaking a branch” – it’d give me a warning, let me dump the data stored branch… and/or track the primary keys for difference in next generation of output, or even attempt to match them using fuzzy logic. And yes, I know this is a pipe dream, but hey, figured I’d ask, just in case there’s something out there I’ve just never seen.

Feel free to comment, I’d be happy to answer any questions, or offer additional info.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T09:07:31+00:00Added an answer on May 18, 2026 at 9:07 am

    Google Refine?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm looking to build a query that will use the non-clustered indexing plan on
We are in a Windows environment and looking to automate this process for non-company
I'm looking for a non-linear curve fitting routine (probably most likely to be found
Looking for feedback on : http://code.google.com/p/google-perftools/wiki/GooglePerformanceTools
Looking for an example that: Launches an EXE Waits for the EXE to finish.
Looking for C# class which wraps calls to do the following: read and write
Looking at what's running and nothing jumps out. Thanks!
Looking to do a very small, quick 'n dirty side project. I like the
Looking for a Linux application (or Firefox extension) that will allow me to scrape
Looking at the C# project templates in VS2008 and the offerings are WPF User

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.