I’d like to slice and dice large datafiles, up to a gig, in a fairly quick and efficient manner. If I use something like UNIX’s ‘CUT’, it’s extremely fast, even in a CYGWIN environment.
I’ve tried developing and benchmarking various Ruby scripts to process these files, and always end up with glacial results.
What would you do in Ruby to make this not so dog slow?
Why not combine them together – using cut to do what it does best and ruby to provide the glue/value add with the results from CUT? you can run shell scripts by putting them in backticks like this: