What algorithms could i use to determine common characters in a set of strings?

Question

0

Asked: May 10, 20262026-05-10T19:34:27+00:00 2026-05-10T19:34:27+00:00

What algorithms could i use to determine common characters in a set of strings?

0

What algorithms could i use to determine common characters in a set of strings?

To make the example simple, I only care about 2+ characters in a row and if it shows up in 2 or more of the sample. For instance:

0000abcde0000
0000abcd00000
000abc0000000
00abc000de000

I’d like to know:

00 was used in 1,2,3,4
000 was used in 1,2,3,4
0000 was used in 1,2,3
00000 was used in 2,3
ab was used in 1,2,3,4
abc was used in 1,2,3,4
abcd was used in 1,2
bc was used in 1,2,3,4
bcd was used in 1,2
cd was used in 1,2
de was used in 1,4

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T19:34:27+00:00

I’m assuming that this is not homework. (If it is, you’re one your own re plagiarism! 😉

Below is a quick-and-dirty solution. The time complexity is O(m**2 * n) where m is the average string length and n is the size of the array of strings.

An instance of Occurrence keeps the set of indices which contain a given string. The commonOccurrences routine scans a string array, calling captureOccurrences for each non-null string. The captureOccurrences routine puts the current index into an Occurrence for each possible substring of the string it is given. Finally, commonOccurrences forms the result set by picking only those Occurrences that have at least two indices.

Note that your example data has many more common substrings than you identified in the question. For example, '00ab' occurs in each of the input strings. An additional filter to select interesting strings based on content (e.g. all digits, all alphabetic, etc.) is — as they say — left as an exercise for the reader. 😉

QUICK AND DIRTY JAVA SOURCE:

package com.stackoverflow.answers;  import java.util.Collections; import java.util.HashMap; import java.util.Map; import java.util.Set; import java.util.TreeSet;  public class CommonSubstringFinder {      public static final int MINIMUM_SUBSTRING_LENGTH = 2;      public static class Occurrence implements Comparable<Occurrence> {         private final String value;         private final Set<Integer> indices;         public Occurrence(String value) {             this.value = value == null ? '' : value;             indices = new TreeSet<Integer>();         }         public String getValue() {             return value;         }         public Set<Integer> getIndices() {             return Collections.unmodifiableSet(indices);         }         public void occur(int index) {             indices.add(index);         }         public String toString() {             StringBuilder result = new StringBuilder();             result.append(''').append(value).append(''');             String separator = ': ';             for (Integer i : indices) {                 result.append(separator).append(i);                 separator = ',';             }             return result.toString();         }         public int compareTo(Occurrence that) {             return this.value.compareTo(that.value);         }     }      public static Set<Occurrence> commonOccurrences(String[] strings) {         Map<String,Occurrence> work = new HashMap<String,Occurrence>();         if (strings != null) {             int index = 0;             for (String string : strings) {                 if (string != null) {                     captureOccurrences(index, work, string);                 }                 ++index;             }         }         Set<Occurrence> result = new TreeSet<Occurrence>();         for (Occurrence occurrence : work.values()) {             if (occurrence.indices.size() > 1) {                 result.add(occurrence);             }         }         return result;     }      private static void captureOccurrences(int index, Map<String,Occurrence> work, String string) {         final int maxLength = string.length();         for (int i = 0; i < maxLength; ++i) {             for (int j = i + MINIMUM_SUBSTRING_LENGTH; j < maxLength; ++j) {                 String partial = string.substring(i, j);                 Occurrence current = work.get(partial);                 if (current == null) {                     current = new Occurrence(partial);                     work.put(partial, current);                 }                 current.occur(index);             }         }     }      private static final String[] TEST_DATA = {         '0000abcde0000',         '0000abcd00000',         '000abc0000000',         '00abc000de000',     };     public static void main(String[] args) {         Set<Occurrence> found = commonOccurrences(TEST_DATA);         for (Occurrence occurrence : found) {             System.out.println(occurrence);         }     }  }

SAMPLE OUTPUT: (note that there was actually only one Occurrence per line; I can’t seem to prevent the blockquote markup from merging lines)

’00’: 0,1,2,3 ‘000’: 0,1,2,3
‘0000’: 0,1,2 ‘0000a’: 0,1
‘0000ab’: 0,1 ‘0000abc’: 0,1
‘0000abcd’: 0,1 ‘000a’: 0,1,2
‘000ab’: 0,1,2 ‘000abc’: 0,1,2
‘000abcd’: 0,1 ’00a’: 0,1,2,3
’00ab’: 0,1,2,3 ’00abc’: 0,1,2,3
’00abc0′: 2,3 ’00abc00′: 2,3
’00abc000′: 2,3 ’00abcd’: 0,1
‘0a’: 0,1,2,3 ‘0ab’: 0,1,2,3
‘0abc’: 0,1,2,3 ‘0abc0’: 2,3
‘0abc00’: 2,3 ‘0abc000’: 2,3
‘0abcd’: 0,1 ‘ab’: 0,1,2,3 ‘abc’: 0,1,2,3 ‘abc0’: 2,3 ‘abc00’: 2,3
‘abc000’: 2,3 ‘abcd’: 0,1 ‘bc’: 0,1,2,3 ‘bc0’: 2,3 ‘bc00’: 2,3
‘bc000’: 2,3 ‘bcd’: 0,1 ‘c0’: 2,3 ‘c00’: 2,3 ‘c000’: 2,3 ‘cd’: 0,1
‘de’: 0,3 ‘de0’: 0,3 ‘de00’: 0,3
‘e0’: 0,3 ‘e00’: 0,3

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What algorithms could i use to determine common characters in a set of strings?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply