I have a large set of strings. I want to divide the strings into

Question

0

Editorial Team

Asked: June 1, 20262026-06-01T12:36:26+00:00 2026-06-01T12:36:26+00:00

I have a large set of strings. I want to divide the strings into

0

I have a large set of strings. I want to divide the strings into subsets such that:

Each item in a subset shares 1 or more contiguous characters.
The shared contiguous characters that define a subset are unique for the set of subsets (i.e. the shared characters are sufficient for defining a subset of strings that stands in a mutually exclusive relationship with other subsets).
The subsets are roughly the same size.
The resulting set of subsets is the minimal number of subsets needed that fit the above criteria.

For example given the following set of names:

Alan,Larry,Alfred,Barbara,Alphonse,Carl

I can divide this set into two subsets of equal size. Subset 1 defined by the contiguous characters “AL” would be

Alan, Alfred, Alphonse

Subset 2 defined by contiguous characters ar would be

Larry, Barbara, Carl.

I am looking for an algorithm that would do this for any arbitrary set of strings. The resulting set of subsets does not have to equal 2 but it should be the minimum set and the resulting subsets should be approximately equal.

Elliott

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T12:36:27+00:00

Editorial Team

2026-06-01T12:36:27+00:00Added an answer on June 1, 2026 at 12:36 pm

Have a look at http://en.wikipedia.org/wiki/Suffix_array. It is possible that what you really want to do is to create a suffix array for each document, and them merge all the suffix arrays, with pointers back to the original versions, so that you can search the collection as one for a string by looking for it as a suffix in the array.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large set of strings. I want to divide the strings into

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply