I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux’ wc -l command takes 0.15 seconds.
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the
LineNumberReadersolution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I’ve played a bit with the code, and have produced a new version that is consistently fastest:Benchmark resuls for a 1.3GB text file, y axis in seconds. I’ve performed 100 runs with the same file, and measured each run with
System.nanoTime(). You can see thatcountLinesOldhas a few outliers, andcountLinesNewhas none and while it’s only a bit faster, the difference is statistically significant.LineNumberReaderis clearly slower.