Hello fellow coders.
So i decided to rewrite some of my old scripts i had lying around in haskell just because i need the practice and i like the language. So here i am trying to filter a huge file (around 1.7 GB) , cut the lines of no interest and write the remaining stuff in another file.
I thought that haskell’s lazy nature would be ideal for this but the code keeps running out of memory too soon. The previous versions (c# or Python) had a read line -> write line approach but i tried a different approach here. Should i just rewrite the code to mirror the previous version or am i missing something.
So this is the function in charge of the original file filtering:
getLines :: FilePath -> IO [[String]]
getLines path = do
text<-readFile path
let linii=lines text
let tokens = map words linii
let filtrate=[x|x<-tokens,length x>7,isTimeStamp (x!!0),isDiagFrame x]
return filtrate
this one is in charge of writing one line at a time in the new file (altho i tried to use writeFile dirrectly and failed miserably 🙂 :
writeLines ::Handle->[[String]]->IO ()
writeLines handle linii = do
let linie=concat $ intersperse " " (head linii)
hPutStrLn handle linie
if length linii > 0 then
writeLines handle (tail linii)
else
print "Writing complete..."
and these 2 are the main function and another one in charge of geting the handle and passing it around :
writeTheFile :: FilePath->FilePath->IO ()
writeTheFile inf outf = do
handle<-openFile outf WriteMode
linii<-getLines inf
writeLines handle linii
print "Write Complete"
main = do
arg<-getArgs
if length arg/=2 then
print "Use like this : trace_pars [In_File] [Out_File] !"
else
writeTheFile (arg!!0) (arg!!1)
Any advice would be greatly appreciated…thanks in advance
The problem here is in this line:
You are computing the length of your list of lines. This means that the whole list of lines has to be loaded for it to be counted. Which means that the whole file that you’re reading needs to be loaded into memory. Not good!
The solution is to use
if not . null $ linii theninstead. Thenullfunction checks whether a list is empty (which only forces the first line of the list to be loaded), andnotbehaves like you’d expect.If you would like a more idiomatic version of
writeLines(Note the use ofFilePathinstead ofHandle):This function is the same as:
unlinesis the same asintercalate "\n", andunwordsis the same asintercalate " ".intercalate xis the same asconcat . intersperse x.I think that this should be enough information for you to understand what’s going on.