To explain in a clearer way my question I will start by explaining the

Question

0

Asked: May 21, 20262026-05-21T18:47:40+00:00 2026-05-21T18:47:40+00:00

To explain in a clearer way my question I will start by explaining the

0

To explain in a clearer way my question I will start by explaining the real-life case I am facing.

I am building a physical panel with many words on it that can be selectively lit, in order to compose sentences. This is my situation:

I know all the sentences that I want to display
I want to find out [one of] the shortest set of ORDERED words that allows me to display all the sentences

Example:

 SENTENCES:
 "A dog is on the table"
 "A cat is on the table"
 SOLUTIONS:
 "A dog cat is on the table"
 "A cat dog is on the table"

I tried to approach this problem with “positional rules” finding for each UNIQUE word in the set of ALL the words used in ALL the sentences, what words should be at the left or at the right of it. In the example above, the ruleset for the “on” word would be “left(A, dog, cat, is) + right(the, table).

This approach worked for trivial cases, but my real-life situation has two additional difficulties that got me stuck and that have both to do with the need for repeating words:

In-sentence repetitions: “the cat is on the table” has two “the”.
Circular references: In a set of three sentences “A red cat” + “My cat is on the table” + “That table is red”, the rules would state that RED should be at the left of CAT, CAT should be at the left of TABLE and TABLE should be at the left of RED.

MY QUESTION THEREFORE IS:

What is the class of algorithms (or
even better: what is the specific
algorithm) that studies and solves
this kind of problems? Could you post
some reference or a code example of
it?

EDIT: Level of complexity

From the first round of answers it appears the actual level of complexity (i.e. how different are the sentences one from the other) is an important factor. So, here comes some info on that:

I have about 1500 sentences I want to represent.
All of the sentences are essentially modifications of a restricted pool of ~10 sentences where only a few words change. Building on the previous example, it’s a bit like all my sentences would speak about either “somebody’s pet’s position relative to a piece of furniture” or “a physical description of somebody’s furniture”.
The number of unique words used to build all the sentences is <100.
Sentences are 8 words long at most.

For this project I am using python, but any language reasonably readable (eg: NOT obfuscated perl!) will be fine.

Thank you in advance for your time!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T18:47:41+00:00

After a week of so of coding (this is hobby project) I decided to answer my own question. This is not because the answers I previously got weren’t good enough, but rather because I used the three of them to code the solution I wanted, and it felt wrong to give credit to only one of the responders, as I truly used input by the three of them to come up with a satisfactory solution.

Let’s start from the end: the heuristic I came up gives very satisfactory results (at least for the real-life case I am using it for). I had 1440 sentences with an average of ~6 words each and using a set of 70 unique words. The program takes about 1 minute to run and provides me with a supersequence of just 76 words (10% more than the “physical” [but not “logical”] lower limit of the problem).

The heuristic is really tailored around my real-life case, in particular around the fact that most of the sentences are constructed around 10 or so “prototypes” (see point #2 of my edit in the question) and is composed of four successive steps:

Isomorphic shrinking
Similarity shrinking
Coarse redundancy optimisation
Fine redundancy optimisation

Isomorphic shrinking

I defined as “isomorphic” two sentences A and B such than transforming A in B and B in A would require exactly the same steps. Example:

'A dog is on the carpet'
'A cat is under the table'

The transformation always require to change words in position 1, 3, 5 of the first string with words in position 1, 3, 5 of the second.

Grouping sentences in “isomorphic families” allows to easily create optimised superstrings by simply inserting in the common root "A __ is __ the __" the list of variable elements for position 1, 3, 5.

Similarity shrinking

At this stage of the process the number of sentences has dramatically lowered (there is normally a mixture of about 50 supersequences from isomorphic families and orphan sentences that were not isomorphic to any other in the pool). The program analyses this new pool and merges together the two most similar strings, then repeating the procedure on the new set composed by the result of the merge and the strings that haven’t been merged yet, and so on until everything has been merged into one supersequence.

Coarse redundancy optimisation

At this stage we already have a valid supersequence to all the original 1440 sentences, but such supersequence is grossly suboptimal as many terms are repeated without the need for it. This optimisation step remove the bulk of the redundant terms by simply using the supersequence to formulate all the sentences, and then removing from it all the terms that haven’t been used.

Fine redundancy optimisation

The result of the previous optimisation is pretty good already, but sometimes is possible to trim out a word of two via this last step. The optimisation here works by finding words that are repeated more than once, and checking if it possible to make two successive repetitions to converge towards the same location in the supersequence and still formulate all the sentences. Example: given the supersequence

aaa xxx bbb ccc ddd eee xxx fff

the program will try to shift the two xxx words towards each other:

aaa bbb xxx ccc ddd eee xxx fff
aaa bbb xxx ccc ddd xxx eee fff
aaa bbb ccc xxx ddd xxx eee fff

if the supersequence reaches:

aaa bbb ccc xxx xxx ddd eee fff

and no sentence uses both occurrences of xxx at the same time, then one of the two xxx can go.

Of course this last passage could be optimised by shifting xxx of more than one position at a time (think of shell sorting vs. bubble sorting), but the general structure of the program is such, and the gain in speed so little, that I preferred this “less optimised” procedure, as this “shifting procedure” is used elsewhere too.

Again, many thanks to all the original responder: your input was paramount to make me think of this solution. Your comments to this solution are also very welcomed!

FINAL NOTE: As soon as the program will be complete / the project finished (a couple of months at worst), I will release it under GPL and add the link to the code repo in the original question. I believe that marking the question as “favourite” should notify the marker of any edit…. just in case you are interested in the actual code, of course!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

To explain in a clearer way my question I will start by explaining the

Leave an answerCancel reply

1 Answer

Isomorphic shrinking

Similarity shrinking

Coarse redundancy optimisation

Fine redundancy optimisation

Leave an answer
Cancel reply