Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6712697
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T08:17:24+00:00 2026-05-26T08:17:24+00:00

I have the need to compare very large file-based strings of equal length for

  • 0

I have the need to compare very large file-based strings of equal length for simple equality, without first calculating a hash.

I want to use the data in the string to make large, seemingly random jumps, so that I can quickly determine a test for inequality even in strings that start and end the same way. That is, I want to jump throughout the range, in some way that mostly or completely avoids hitting the same character too many times.

Since the strings are file-based and very large, I don’t want my jumps to be too large because that will thrash the disk.

In my program, a string is simple a sequence of chars backed by a file and less than 2gig in size, but rarely completely in memory at once.

Then after trying for awhile I assume they are equal and I just iterate in order.

My string class variations all have a base interface of int length() and char charAt() functions, assuming java chars, which are usually but not always ascii.

Any ideas,
Andy

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T08:17:24+00:00Added an answer on May 26, 2026 at 8:17 am

    Build some meta data about your giant strings.

    Let’s say you have them split into logical pages or blocks. You pick a block size and when you load a block into memory you hash it, storing this hash in a lookup table.

    When you go to compare two files, you can first compare known hashes of subsections before going to disk to get more.

    This should give you a good balance of caching and removing the need for disk access, without giving you too much overhead.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to have very high-performance loop going over large datasets. I need to
I have a project where I need to compare multi-chapter documents to a second
I have a number of traceroutes that i need to compare against each other
Hi i have Table_Subject and i need to compare subject name with table_subject's subject
I have been busy with a exercise where I need to compare a winning
I have an if statement, and I need to compare a single string with
I have two android applications that I have developed. I need to compare the
I am trying to deal with a very large dataset. I have k =
I need to read from a dataset which is very large, highly interlinked, the
Here is the main problem. I have very large database (25,000 or so) of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.