Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8459813
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T13:25:55+00:00 2026-06-10T13:25:55+00:00

I have a large (~100GB) text file structured like this: A,foobar A,barfoo A,foobar B,barfoo

  • 0

I have a large (~100GB) text file structured like this:

A,foobar
A,barfoo
A,foobar
B,barfoo
B,barfoo
C,foobar

Each line is a comma-separated pair of values. The file is sorted by the first value in the pair. The lines are variable length. Define a group as being all lines with a common first value, i.e. with the example quoted above all lines starting with “A,” would be a group, all lines starting with “B,” would be another group.

The entire file is too large to fit into memory, but if you took all the lines from any individual group will always fit into memory.

I have a routine for processing a single such group of lines and writing to a text file. My problem is that I don’t know how best to read the file a group at a time. All the groups are of arbitrary, unknown size. I have considered two ways:

1) Scan the file using a BufferedReader, accumulating the lines from a group in a String or array. Whenever a line is encountered that belongs to a new group, hold that line in a temporary variable, process the previous group. Clear the accumulator, add the temporary and then continue reading the new group starting from the second line.

2) Scan the file using a BufferedReader, whenever a line is encountered that belongs to a new group, somehow reset the cursor so that when readLine() is next invoked it starts from the first line of the group instead of the second. I have looked into mark() and reset() but these require knowing the byte-position of the start of the line.

I’m going to go with (1) at the moment, but I would be very grateful if someone could suggest a method that smells less.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T13:25:57+00:00Added an answer on June 10, 2026 at 1:25 pm

    I think a PushbackReader would work:

     if (lineBelongsToNewGroup){
         reader.unread(lastLine.toCharArray());
         // probably also unread a newline
     }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have moderately large binary file consisting of independent blocks like this: header1 data1
I have a large 100mb file which I would like to perform about 5000
Considering i have a 100GB txt file containing millions of lines of text. How
i have large numbers of text files and i am in problem that i
I have large images displayed in a grouped tableview. I would like the images
I have large video files (~100GB) that are local on my machine. I have
I have several large text text files that all have the same structure and
I have an application that logs information to a daily text file every second
I have a large file in windows XP - its 38GB. (a VM image)
I have large text files upon which all kinds of operations need to be

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.