Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5975201
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T21:02:59+00:00 2026-05-22T21:02:59+00:00

I want to split a txt file into multiple files where each file contains

  • 0

I want to split a txt file into multiple files where each file contains no more than 5Mb. I know there are tools for this, but I need this for a project and want to do it in Ruby. Also, I prefer to do this with File.open in block context if possible, but I fail miserably :o(

#!/usr/bin/env ruby

require 'pp'

MAX_BYTES = 5_000_000

file_num = 0
bytes    = 0

File.open("test.txt", 'r') do |data_in|
  File.open("#{file_num}.txt", 'w') do |data_out|
    data_in.each_line do |line|
      data_out.puts line

      bytes += line.length

      if bytes > MAX_BYTES
        bytes = 0
        file_num += 1
        # next file
      end
    end
  end
end

This work, but I don’t think it is elegant. Also, I still wonder if it can be done with File.open in block context only.

#!/usr/bin/env ruby

require 'pp'

MAX_BYTES = 5_000_000

file_num = 0
bytes    = 0

File.open("test.txt", 'r') do |data_in|
  data_out = File.open("#{file_num}.txt", 'w')

  data_in.each_line do |line|
    data_out = File.open("#{file_num}.txt", 'w') unless data_out.respond_to? :write
    data_out.puts line

    bytes += line.length

    if bytes > MAX_BYTES
      bytes = 0
      file_num += 1
      data_out.close
    end
  end

  data_out.close if data_out.respond_to? :close
end

Cheers,

Martin

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T21:03:00+00:00Added an answer on May 22, 2026 at 9:03 pm

    [Updated] Wrote a short version without any helper variables and put everything in a method:

    def chunker f_in, out_pref, chunksize = 1_073_741_824
      File.open(f_in,"r") do |fh_in|
        until fh_in.eof?
          File.open("#{out_pref}_#{"%05d"%(fh_in.pos/chunksize)}.txt","w") do |fh_out|
            fh_out << fh_in.read(chunksize)
          end
        end
      end
    end
    
    chunker "inputfile.txt", "output_prefix" (, desired_chunk_size)
    

    Instead of a line loop you can use .read(length) and do a loop only for the EOF marker and the file cursor.

    This takes care that the chunky files are never bigger than your desired chunk size.

    On the other hand it never takes care for line breaks (\n)!

    Numbers for chunk files will be generated from integer division of current file curser position by chunksize, formatted with "%05d" which result in 5-digit numbers with leading zero (00001).

    This is only possible because .read(chunksize) is used. In the second example below, it could not be used!

    Update: Splitting with line break recognition

    If your really need complete lines with \n then use this modified code snippet:

    def chunker f_in, out_pref, chunksize = 1_073_741_824
      outfilenum = 1
      File.open(f_in,"r") do |fh_in|
        until fh_in.eof?
          File.open("#{out_pref}_#{outfilenum}.txt","w") do |fh_out|
            loop do
              line = fh_in.readline
              fh_out << line
              break if fh_out.size > (chunksize-line.length) || fh_in.eof?
            end
          end
          outfilenum += 1
        end
      end
    end
    

    I had to introduce a helper variable line because I want to ensure that the chunky file size is always below the chunksize limit! If you don’t do this extended check you will get also file sizes above the limit. The while statement only successfully checks in next iteration step when the line is already written. (Working with .ungetc or other complex calculations will make the code more unreadable and not shorter than this example.)

    Unfortunately you have to have a second EOF check, because the last chunk iteration will mostly result in a smaller chunk.

    Also two helper variables are needed: the line is described above, the outfilenum is needed, because the resulting file sizes mostly do not match the exact chunksize.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have this file file.txt which I want to split into many smaller ones.
I want to split a file containg HTTP response into two files: one containing
I want to split up the jQuery .js file into two, but I have
We want to split our large asp.net mvc web application into multiple Visual Studio
i want to split the searchrequest into parts, if there's nothing to find. example:
I have a tab-delimited text file. I have split this into columns. Each of
I am trying to split one big file into individual entries. Each entry ends
I have several large files, each of which I want to chunk/split it in
I want to split an arithmetic expression into tokens, to convert it into RPN.
I want to split a string like this: abc//def//ghi into a part before and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.