Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9137371
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T09:05:43+00:00 2026-06-17T09:05:43+00:00

Is there a better [pre-existing optional Java 1.6] solution than creating a streaming file

  • 0

Is there a better [pre-existing optional Java 1.6] solution than creating a streaming file reader class that will meet the following criteria?

  • Given an ASCII file of arbitrary large size where each line is terminated by a \n
  • For each invocation of some method readLine() read a random line from the file
  • And for the life of the file handle no call to readLine() should return the same line twice

Update:

  • All lines must eventually be read

Context: the file’s contents are created from Unix shell commands to get a directory listing of all paths contained within a given directory; there are between millions to a billion files (which yields millions to a billion lines in the target file). If there is some way to randomly distribute the paths into a file during creation time that is an acceptable solution as well.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T09:05:44+00:00Added an answer on June 17, 2026 at 9:05 am

    If the number of files is truly arbitrary it seems like there could be an associated issue with tracking processed files in terms of memory usage (or IO time if tracking in files instead of a list or set). Solutions that keep a growing list of selected lines also run in to timing-related issues.

    I’d consider something along the lines of the following:

    1. Create n “bucket” files. n could be determined based on something that takes in to account the number of files and system memory. (If n is large, you could generate a subset of n to keep open file handles down.)
    2. Each file’s name is hashed, and goes into an appropriate bucket file, “sharding” the directory based on arbitrary criteria.
    3. Read in the bucket file contents (just filenames) and process as-is (randomness provided by hashing mechanism), or pick rnd(n) and remove as you go, providing a bit more randomosity.
    4. Alternatively, you could pad and use the random access idea, removing indices/offsets from a list as they’re picked.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

How do I write to a pre-specified binary file format in Java that I
Is there better way to enumerate all photos on the device than this one?
Is there a better way to format data uniformly than to store data as
I have a webpage that is a Search page. There are a few pre-defined
Objective: To create an Index that accommodates a pre-existing set of price data from
Does anyone know if there is any existing solution for determining a timezone from
Is there a better way to do this than to check for Exception string?
I am attempting to integrate a project into a pre-existing solution. The start-up project
Is there an already existing function in PHP for creating an associative array from
Is there better use of available storage space and faster network access using the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.