I have two strings, “hello” an “eo” for instance, and I wish to find duplicate characaters between the two strings, that is: ‘e’ and ‘o’ in this example.
My algorithm would go this way
void find_duplicate(char* str_1, char* str_2, int len1, int len2)
{
char c ;
if(len1 < len2)
{
int* idx_1 = new int[len1]; // record elements in little string
// that are matched in big string
for(int k = 0 ; k < len1 ; k++)
idx_1[k] = 0;
int* idx_2 = new int[len2]; // record if element in str_2 has been
// matched already or not
for(int k = 0 ; k < len2 ; k++)
idx_2[k] = 0;
for(int i = 0 ; i < len2 ; i++)
{
c = str_1[i];
for(int j = 0 ; j < len1 ; j++)
{
if(str_2[j] == c)
{
if(idx_2[j] == 0) // this element in str_2 has not been matched yet
{
idx_1[i] = j + 1; // mark ith element in idx as matched in string 2 at pos j
idx_2[j] = 1;
}
}
}
}
// now idx_1 and idx_2 contain matches info, let's remove matches.
char* str_1_new = new char[len1];
char* str_2_new = new char[len2];
int kn = 0;
for(int k = 0 ; k < len1 ; k++)
{
if(idx_1[k] > 0)
{
str_1_new[kn] = str_1[k];
kn++;
}
}
kn = 0;
for(int k = 0 ; k < len2 ; k++)
{
if(idx_2[k] > 0)
{
str_2_new[kn] = str_2[k];
kn++;
}
}
}
else
{
// same here, switching roles (do it yourself)
}
}
i feel my solution is awkward:
– symetry of both cases in first if/else and code duplication
– time complexity: 2*len1*len2 operations for finding duplicates, then len1 + len2 operations for removal
– space complexity: two len1 and two len2 char*.
What if len1 and len2 are not given (with and without resort to STL vector) ?
could you provide your implementation of this algo ?
thanks
First of all, it isn’t substring matching problem – it is problem of finding common characters between two strings.
Your solution works in O(n*m), where n=len1 and m=len2 in your code. You could easily solve the same problem in O(n+m+c) time by counting characters in each of strings (where c is equal to size of character set). This algorithm is called counting sort.
Sample code implementing this in your case:
Please note that I am also sorting the output here. If you do not want to do that, you can skip the counting of characters in second string and just start comparing duplicates on the go.
If you, however, intend to be able to detect multiple duplicates of the same letter (e.g. if “banana” and “arena” should output “aan” instead of “an”), then you can just substract the number of counts in current solution and adjust the output accordingly.