You are given a string and an array of strings. How to quickly check, if this string can be built by concatenating some of the strings in the array?
This is a theoretical question, I don’t need it for practical reasons. But I would like to know, if there is some good algorithm for this.
EDIT
Reading some answer I have noticed, that this is probably NP-Complete problem. Even finding a subset of strings, that will together have same length, as a given string is a classic subset sum problem.
So I guess there is no easy answer to this.
EDIT
Now it seems, that it is not a NP-Complete problem after all. That’s way cooler 🙂
EDIT
I have came up with a solution that passes some tests:
def can_build_from_substrings(string, substrings):
prefixes = [True] + [False] * (len(string) - 1)
while True:
old = list(prefixes)
for s in substrings:
for index, is_set in enumerate(prefixes):
if is_set and string[index:].startswith(s):
if string[index:] == s:
return True
prefixes[index + len(s)] = True
if old == prefixes: # nothing has changed in this iteration
return False
I believe the time is O(n * m^3), where n is length of substrings and m is length of string. What do you think?
Note: I assume here that you can use each substring more than once. You can generalize the solution to include this restriction by changing how we define subproblems. That will have a negative impact on space as well as expected runtime, but the problem remains polynomial.
This is a dynamic programming problem. (And a great question!)
Let’s define
composable(S, W)to be true if the stringScan be written using the list of substringsW.Sis composable if and only if:Sstarts with a substringwinW.Safterwis also composable.Let’s write some pseudocode:
This algorithm has O(m*n) runtime, assuming the length of the substrings is not linear w/r/t to the string itself, in which case runtime would be O(m*n^2) (where m is the size of the substring list and n is the length of the string in question). It uses O(n) space for memoization.
(N.B. as written the pseudocode uses O(n^2) space, but hashing the memoization keys would alleviate this.)
EDIT
Here is a working Ruby implementation: