When I read a text file into memory it brings my text in with

Question

0

Asked: June 11, 20262026-06-11T07:05:23+00:00 2026-06-11T07:05:23+00:00

When I read a text file into memory it brings my text in with

0

When I read a text file into memory it brings my text in with ‘\n’ at the end due to the new lines.

["Hello\n", "my\n", "name\n", "is\n", "John\n"]

Here is how I am reading the text file

array = File.readlines('text_file.txt')

I need to do a lot of processing on this text array, so I’m wondering if I should remove the “\n” when I first create the array, or when I do the processing on each element with regex, performance wise.

I wrote some (admittedly bad) test code to remove the “\n”

array = []
File.open('text_file.txt', "r").each_line do |line|
    data = line.split(/\n/)
    array << data
end
array.flatten!

Is there a better way to do this if I should remove the “\n” when I first create the array?

If I wanted to read the file into a Set instead(for performance), is there a method similar to readlines to do that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T07:05:24+00:00

You need to run a benchmark test, using Ruby’s built-in Benchmark to figure out what is your fastest choice.

However, from experience, I’ve found that “slurping” the file, i.e., reading it all in at once, is not any faster than using a loop with IO.foreach or File.foreach. This is because Ruby and the underlying OS do file buffering as the reads occur, allowing your loop to occur from memory, not directly from disk. foreach will not strip the line-terminators for you, like split would, so you’ll need to add a chomp or chomp! if you want to mutate the line read in:

File.foreach('/path/to/file') do |li|
  puts li.chomp
end

or

File.foreach('/path/to/file') do |li|
  li.chomp!
  puts li
end

Also, slurping has the problem of not being scalable; You could end up trying to read a file bigger than memory, taking your machine to its knees, while reading line-by-line will never do that.

Here’s some performance numbers:

#!/usr/bin/env ruby

require 'benchmark'
require 'fileutils'

FILENAME = 'test.txt'
LOOPS = 1

puts "Ruby Version: #{RUBY_VERSION}"
puts "Filesize being read: #{File.size(FILENAME)}"
puts "Lines in file: #{`wc -l #{FILENAME}`.split.first}"

Benchmark.bm(20) do |x|
  x.report('read.split')           { LOOPS.times { File.read(FILENAME).split("\n") }}
  x.report('read.lines.chomp')     { LOOPS.times { File.read(FILENAME).lines.map(&:chomp) }}
  x.report('readlines.map.chomp1') { LOOPS.times { File.readlines(FILENAME).map(&:chomp) }}
  x.report('readlines.map.chomp2') { LOOPS.times { File.readlines(FILENAME).map{ |s| s.chomp } }}
  x.report('foreach.map.chomp1')   { LOOPS.times { File.foreach(FILENAME).map(&:chomp) }}
  x.report('foreach.map.chomp2')   { LOOPS.times { File.foreach(FILENAME).map{ |s| s.chomp } }}
end

And the results:

Ruby Version: 1.9.3
Filesize being read: 42026131
Lines in file: 465440
                           user     system      total        real
read.split             0.150000   0.060000   0.210000 (  0.213365)
read.lines.chomp       0.470000   0.070000   0.540000 (  0.541266)
readlines.map.chomp1   0.450000   0.090000   0.540000 (  0.535465)
readlines.map.chomp2   0.550000   0.060000   0.610000 (  0.616674)
foreach.map.chomp1     0.580000   0.060000   0.640000 (  0.641563)
foreach.map.chomp2     0.620000   0.050000   0.670000 (  0.662912)

On today’s machines a 42MB file can be read into RAM pretty safely. I have seen files a lot bigger than that which won’t fit into the memory of some of our production hosts. While foreach is slower, it’s also not going to take a machine to its knees by sucking up all memory if there isn’t enough memory.

On Ruby 1.9.3, using the map(&:chomp) method, instead of the older form of map { |s| s.chomp }, is a lot faster. That wasn’t true with older versions of Ruby, so caveat emptor.

Also, note that all the above processed the data in less than one second on my several years old Mac Pro. All in all I’d say that worrying about the load speed is premature optimization, and the real problem will be what is done after the data is loaded.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When I read a text file into memory it brings my text in with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply