Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1052037
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T17:01:43+00:00 2026-05-16T17:01:43+00:00

I am comparing substrings in two large text files. Very simple, tokenizing into two

  • 0

I am comparing substrings in two large text files. Very simple, tokenizing into two token containers, comparing with 2 for loops. Performance is disastrous! Does anybody have an advice or idea how to improve performance?

for (int s = 0; s < txtA.TokenContainer.size(); s++) {
    String strTxtA = txtA.getSubStr(s);
    strLengthA = txtA.getNumToken(s);

    if (strLengthA >= dp.getMinStrLength()) {
        int tokenFileB = 1;

        for (int t = 0; t < txtB.TokenContainer.size(); t++) {
            String strTxtB = txtB.getSubStr(t);
            strLengthB = txtB.getNumToken(t);

            if (strTxtA.equalsIgnoreCase(strTxtB)) {
                try {
                    subStrTemp = new SubStrTemp(
                        txtA.ID, txtB.ID, tokenFileA, tokenFileB,
                        (tokenFileA + strLengthA - 1), 
                        (tokenFileB + strLengthB - 1));

                    if (subStrContainer.contains(subStrTemp) == false) {
                        subStrContainer.addElement(subStrTemp);
                    }
                } catch (Exception ex) {
                    logger.error("error");
                }
            }
            tokenFileB += strLengthB;
        }
        tokenFileA += strLengthA;
    }
}

Generally my code reading two large Strings with Java Tokonizer into containers A and B. And then trying to compare substrings.Possision of Substrgs which are existing in both strings to store into a Vector. But performance is awful, also don’t really know how to solve it with HashMap.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T17:01:44+00:00Added an answer on May 16, 2026 at 5:01 pm

    Your main problem is that you go through all txtB for each token in txtA.

    You should store informations on token from txtA (in a HashMap for instance) and then in a second loop (but not a nested one) you compare the strings with the existing one in the Map.


    On the same topic :

    • term frequency using java program
    • How to count words in java
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Comparing two images to see if they are both the same files is easy:
Comparing the two data models in an asp.net MVC app, which provides better performance,
Comparing two sql values that are almost the same. I know this is simple
Short of recursing and comparing dates of files/directories, is there a better way of
Possible duplicate : comparing-two-arrays I have two NSArray and I'd like to create a
Comparing two codes below,both do the same,but with slighty differences: ALTER procedure [dbo].[SP_USUARIOS_UPDATE] @usu_ds
Comparing those two values shall result in a true: 53.9173333333333 53.9173
Comparing two numbers x,y; if x>y I return 1, else return 0. I am
When comparing two different algorithm implementations (thus, not caring for their absolute, but only
I've had some success comparing strings using the PHP levenshtein function. However, for two

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.