I am trying to find all possible common strings from a file consisting of

Question

0

Asked: May 29, 20262026-05-29T05:25:41+00:00 2026-05-29T05:25:41+00:00

I am trying to find all possible common strings from a file consisting of

0

I am trying to find all possible common strings from a file consisting of strings of various lengths. Can anybody help me out?

E.g input file is sorted:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC    
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAATTAGGCTGGG
AAAAAAAATTGAAACATCTATAGGTC
AAAAAAACTCTACCTCTCT
AAAAAAACTCTACCTCTCTATACTAATCTCCCTACA

and my desired output is:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC    
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAATTAGGCTGGG
AAAAAAAATTGAAACATCTATAGGTC
AAAAAAACTCTACCTCTCTATACTAATCTCCCTACA

[EDIT] Each line which is a substring of any other line should be removed.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T05:25:41+00:00

Basically for each line, compare it with the next line to see if the next line is shorter or if the next line’s substring is not equal to the current line. If this is true, the line is unique. This can be done with a single linear pass because the list is sorted: any entry which contains a substring of the entry will follow that entry.

A non-algorithmic optimization (micro-optimization) is to avoid the use of substr which creates a new string. We can simply compare the other string as though it was truncated without actually creating a truncated string.

vector<string> unique_lines;
for (unsigned int j=0; j < lines.size() - 2; ++j)
{
    const string& line = lines[j];
    const string& next_line = lines[j + 1];

    // If the line is not a substring of the next line,
    // add it to the list of unique lines.
    if (line.size() >= next_line.size() || 
        line != next_line.substr(0, line .size()))
        unique_lines.push_back(line);
}

// The last line is guaranteed to not be a substring of any
// previous line as the lines are sorted.
unique_lines.push_back(lines.back());

// The desired output will be contained in 'unique_lines'.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to find all possible common strings from a file consisting of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply