Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 69439
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T19:34:27+00:00 2026-05-10T19:34:27+00:00

What algorithms could i use to determine common characters in a set of strings?

  • 0

What algorithms could i use to determine common characters in a set of strings?

To make the example simple, I only care about 2+ characters in a row and if it shows up in 2 or more of the sample. For instance:

  1. 0000abcde0000
  2. 0000abcd00000
  3. 000abc0000000
  4. 00abc000de000

I’d like to know:

00 was used in 1,2,3,4
000 was used in 1,2,3,4
0000 was used in 1,2,3
00000 was used in 2,3
ab was used in 1,2,3,4
abc was used in 1,2,3,4
abcd was used in 1,2
bc was used in 1,2,3,4
bcd was used in 1,2
cd was used in 1,2
de was used in 1,4

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T19:34:27+00:00Added an answer on May 10, 2026 at 7:34 pm

    I’m assuming that this is not homework. (If it is, you’re one your own re plagiarism! 😉

    Below is a quick-and-dirty solution. The time complexity is O(m**2 * n) where m is the average string length and n is the size of the array of strings.

    An instance of Occurrence keeps the set of indices which contain a given string. The commonOccurrences routine scans a string array, calling captureOccurrences for each non-null string. The captureOccurrences routine puts the current index into an Occurrence for each possible substring of the string it is given. Finally, commonOccurrences forms the result set by picking only those Occurrences that have at least two indices.

    Note that your example data has many more common substrings than you identified in the question. For example, '00ab' occurs in each of the input strings. An additional filter to select interesting strings based on content (e.g. all digits, all alphabetic, etc.) is — as they say — left as an exercise for the reader. 😉

    QUICK AND DIRTY JAVA SOURCE:

    package com.stackoverflow.answers;  import java.util.Collections; import java.util.HashMap; import java.util.Map; import java.util.Set; import java.util.TreeSet;  public class CommonSubstringFinder {      public static final int MINIMUM_SUBSTRING_LENGTH = 2;      public static class Occurrence implements Comparable<Occurrence> {         private final String value;         private final Set<Integer> indices;         public Occurrence(String value) {             this.value = value == null ? '' : value;             indices = new TreeSet<Integer>();         }         public String getValue() {             return value;         }         public Set<Integer> getIndices() {             return Collections.unmodifiableSet(indices);         }         public void occur(int index) {             indices.add(index);         }         public String toString() {             StringBuilder result = new StringBuilder();             result.append(''').append(value).append(''');             String separator = ': ';             for (Integer i : indices) {                 result.append(separator).append(i);                 separator = ',';             }             return result.toString();         }         public int compareTo(Occurrence that) {             return this.value.compareTo(that.value);         }     }      public static Set<Occurrence> commonOccurrences(String[] strings) {         Map<String,Occurrence> work = new HashMap<String,Occurrence>();         if (strings != null) {             int index = 0;             for (String string : strings) {                 if (string != null) {                     captureOccurrences(index, work, string);                 }                 ++index;             }         }         Set<Occurrence> result = new TreeSet<Occurrence>();         for (Occurrence occurrence : work.values()) {             if (occurrence.indices.size() > 1) {                 result.add(occurrence);             }         }         return result;     }      private static void captureOccurrences(int index, Map<String,Occurrence> work, String string) {         final int maxLength = string.length();         for (int i = 0; i < maxLength; ++i) {             for (int j = i + MINIMUM_SUBSTRING_LENGTH; j < maxLength; ++j) {                 String partial = string.substring(i, j);                 Occurrence current = work.get(partial);                 if (current == null) {                     current = new Occurrence(partial);                     work.put(partial, current);                 }                 current.occur(index);             }         }     }      private static final String[] TEST_DATA = {         '0000abcde0000',         '0000abcd00000',         '000abc0000000',         '00abc000de000',     };     public static void main(String[] args) {         Set<Occurrence> found = commonOccurrences(TEST_DATA);         for (Occurrence occurrence : found) {             System.out.println(occurrence);         }     }  } 

    SAMPLE OUTPUT: (note that there was actually only one Occurrence per line; I can’t seem to prevent the blockquote markup from merging lines)

    ’00’: 0,1,2,3 ‘000’: 0,1,2,3
    ‘0000’: 0,1,2 ‘0000a’: 0,1
    ‘0000ab’: 0,1 ‘0000abc’: 0,1
    ‘0000abcd’: 0,1 ‘000a’: 0,1,2
    ‘000ab’: 0,1,2 ‘000abc’: 0,1,2
    ‘000abcd’: 0,1 ’00a’: 0,1,2,3
    ’00ab’: 0,1,2,3 ’00abc’: 0,1,2,3
    ’00abc0′: 2,3 ’00abc00′: 2,3
    ’00abc000′: 2,3 ’00abcd’: 0,1
    ‘0a’: 0,1,2,3 ‘0ab’: 0,1,2,3
    ‘0abc’: 0,1,2,3 ‘0abc0’: 2,3
    ‘0abc00’: 2,3 ‘0abc000’: 2,3
    ‘0abcd’: 0,1 ‘ab’: 0,1,2,3 ‘abc’: 0,1,2,3 ‘abc0’: 2,3 ‘abc00’: 2,3
    ‘abc000’: 2,3 ‘abcd’: 0,1 ‘bc’: 0,1,2,3 ‘bc0’: 2,3 ‘bc00’: 2,3
    ‘bc000’: 2,3 ‘bcd’: 0,1 ‘c0’: 2,3 ‘c00’: 2,3 ‘c000’: 2,3 ‘cd’: 0,1
    ‘de’: 0,3 ‘de0’: 0,3 ‘de00’: 0,3
    ‘e0’: 0,3 ‘e00’: 0,3

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What ready available algorithms could I use to data mine twitter to find out
I am looking for suitable algorithms I could use in a sports team management
I'm implementing now CPU schedule algorithms FCFS, SJF and Round Robin. Could somebody tell
I am looking for a robust, efficient data compression algorithm that I could use
We need to convert huge numbers of UUIDS into xml-compatible strings. If we use
According to the Python documentation, only a few hash algorithms are guaranteed to be
Many data mining algorithms/strategies use vector representation of data records in order to simulate
when comparing simple arrays, i use something like the following function to concatenate and
An article has been making the rounds lately discussing the use of genetic algorithms
The .NET System.Security.Cryptography namespace has a rather bewildering collection of algorithms that I could

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.