I want to use the Haskell function
readFile :: FilePath -> IO String
to read the content of a file into a string. In the documentation I have read that “The file is read lazily, on demand, as with getContents.”
I am not sure I understand this completely. For example, suppose that I write
s <- readFile "t.txt"
When this action is executed:
- The file is opened.
- The characters in s are actually read from the file as soon as (but not sooner) they are needed to evaluate some expression (e.g. if I evaluate
length sall the content of the file will be read and the file will be closed). - As soon as the last character has been read, the file handle associated to this call to
readFileis closed (automatically).
Is my third statement correct? So, can I just invoke readFile without closing the file handle myself? Will the handle stay open as long as I have not consumed (visited) the whole result string?
EDIT
Here is some more information regarding my doubts. Suppose I have the following:
foo :: String -> IO String
foo filename = do
s <- readFile "t.txt"
putStrLn "File has been read."
return s
When the putStrLn is executed, I would (intuitively) expect that
scontains the whole content of filet.txt,- The handle used to read the file has been closed.
If this is not the case:
- What does
scontain whenputStrLnis executed? - In what state is the file handle when
putStrLnis executed? - If when
putStrLnis executedsdoes not contain the whole content of the file, when will this content actually be read, and when will the file be closed?
Not quite, the file is not closed "As soon as the last character has been read", at least not usually, it lingers in the semi-closed state it was in during the read for a few moments, the IO-manager/runtime will close it when it next performs such actions. If you’re rapidly opening and reading files, that lag may cause you to run out of file handles if the OS limit isn’t too high.
For most use cases (in my limited experience), however, the closing of the file handle is timely enough. [There are people who disagree and view lazy IO as extremely dangerous in all cases. It definitely has pitfalls, but IMO its dangers are often overstated.]
Yes, when you’re using
readFile, the file handle is closed automatically when the file contents has been entirely read or when it is noticed that the file handle is not referenced anymore.Not quite,
readFileputs the file handle in a semi-closed state, described in the docs forhGetContents:Ah, that’s one of the pitfalls of lazy IO on the other end. Here the file is closed before its contents have been read. Whenfooreturns, the file handle isn’t referenced anymore, and then closed. The consumer offoos result will then find thatsis an empty string, because whenhGetContentstries to actually read from the file, the handle is already closed.I confused the behaviour of
readFilewith that ofthere.
readFileonly closes the file handle aftersis not referenced anymore, so it behaves correctly as expected here.No,
sdoes not contain anything yet but a recipe to maybe get some characters from the file handle. The file handle is semi-closed, but not closed. It will be closed when the file contents has been entirely read, orsgoes out of scope.The first two questions have been answered, the answer to the third is "the file will be read when the contents is consumed", and it will be closed when the entire contents has been read or when it is no longer referenced.
That would be different with the above
bracketinvocation –bracketguarantees that the final operation, here thehClosewill be run even if the other actions throw an exception, therefore its use is often recommended. However, thehCloseis run whenbracketreturns, and then thehGetContentscan’t get any contents from the now really closed file handle. ButreadFilewould not necessarily close the file handle if an exception occurs.That is one of the dangers or quirks of lazy IO, files are not read until their contents is demanded, and if you use lazy IO wrongly, that will be too late and you don’t get any contents.
It’s a trap many (or even most) fall into one time or another, but after having been bitten by it, one quickly learns when IO needs to be non-lazy and do it non-lazily in those cases.
The alternatives (iteratees, enumerators, conduits, pipes, …) avoid those traps [unless the implementer made a mistake], but are considerably less nice to use in those cases where lazy IO is perfectly fine. On the other hand, they treat the cases where laziness is not desired much better.