Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8538269
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T11:05:52+00:00 2026-06-11T11:05:52+00:00

I have a massive text file of strings ordered by line length descending. I

  • 0

I have a massive text file of strings ordered by line length descending. I would like to load the entire thing into a string array, perform Levenshtein on each one, create a group UUID and put that into an array. So the second array would be a hashtable where the key is the memory address of the former string and the value is a UUID.

I would like to perform pointer arithmetic when iterating over the strings to get the best performance.

After iteratively doing levenshtein ga-zillions of times, I would like to populate another text file whose contents are simply, the UUID of the group, a colon, and the line from the original text file.

I have the levenshtein algorithm from wikibooks:

template<class T> unsigned int levenshtein_distance(const T &s1, const T & s2) {
    const size_t len1 = s1.size(), len2 = s2.size();
    vector<unsigned int> col(len2+1), prevCol(len2+1);

    for (unsigned int i = 0; i < prevCol.size(); i++)
            prevCol[i] = i;
    for (unsigned int i = 0; i < len1; i++) {
            col[0] = i+1;
            for (unsigned int j = 0; j < len2; j++)
                    col[j+1] = min( min( 1 + col[j], 1 + prevCol[1 + j]),
                                                            prevCol[j] + (s1[i]==s2[j] ? 0 : 1) );
            col.swap(prevCol);
    }
    return prevCol[len2];
}

I have done some C++, some C, loads of Obj-C. I’m using Windows 7. How do you recommend I do this? What kind string array? How do I convert text strings from a text file to be consumed by the function provided?

I’m basically looking for as many tips as possible, as strings confuse me in C++. Oh and C++ does too!

thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T11:05:53+00:00Added an answer on June 11, 2026 at 11:05 am

    For sheer access time, you would be hard pressed to beat a full read-to-memory, then index it by single-pass, building a pointer list and hard-writing a null-terminator at each CR/LF you encounter. the line number would be the index into the container you’re storing all those pointers in, and for that I’d likely use std::deque<>.

    The boost:: guys will likely carry this further, but for quick access its hard to beat a big’ol’stack of memory and a raft of pointers indexing it. Of course, this entire thing assumes you can fit it into memory. If you can’t, this gets significantly more complicated, but if you can (and can assume you always can) malloc/walk-and-terminate/push-ptr-into-deque seems pretty clean. To truly make it smoke i’d also store the length of each string with the pointer, so your std::deque<> would be of struct { char* ptr; size_t len; }. Doing so would eliminate a copious number of unneeded strlen()’s and such. It would also eliminate the need to null-terminate anything.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have massive directories, and I would like to read all the files as
I'd like to wrap some text onto the next line that I have held
(Disclaimer: I realize this is a massive wall of text, but I have done
I have a massive XML file. However, I'm only interested in a single small
I'm traversing a text file line by line adjusting the text so that the
I have a text file: DATE 20090105 1 2.25 1.5 3 3.6 0.099 4
At moment I have massive of if statement like if ([dicIdentifer isEqualToString:CONF_KEY_CALLMETHOD]) { switch
Let's say we have a massive CSS file that is used to style the
I have a massive .txt file with a list of tens of thousands of
I have a massive query below, the things I don't like are: I can't

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.