I have a problem here which i am trying to solve. The program is

Question

0

Asked: May 20, 20262026-05-20T14:33:01+00:00 2026-05-20T14:33:01+00:00

I have a problem here which i am trying to solve. The program is

0

I have a problem here which i am trying to solve.

The program is given a text file containing the following characters: a – z, A – Z, 0 – 9, fullstop (.) and space . Words in the text file are purely made up of a-z, A-Z and 0-9 . The program receives several queries. Each query is made up of a set of full words already present in the file. The program should return the smallest phrase from the file where all words are present (in any order) . If there are many such phrases, return the first one.

Here is an example. Let us say that the file contains:

Bar is doing a computer science degree. Bar has a computer at home. Bar  is now at home.

Query 1 :

Bar computer a

Response:

Bar has a computer

Query 2:

Bar home

Response:

home. Bar

I thought of this solution. For query 1, Bar is searched first and all three occurences of Bar is assembled as a list. Each node in list also contain the starting position of the smallest phrase and the total length. So it’ll look like

1st node “Bar, 0, 1” [Query, starting posn, total length].
Similarly for 2nd and 3rd node.

Next computer is searched for. The minimum distance of computer for each occurence of Bar is calculated.

1st node “Bar Computer”, 0, 5

2nd node “Bar Computer”, 7 , 4 and so on for other nodes

Next “a” is searched for. The search has to start from the starting position that is mentioned each node and has to be traversed left and right until the word is found as order is unimportant. The minimum of the occurence has to be chosen.

Is this solution on right track? I feel that doing this way, i have to be wary of many cases and there might be a simpler solution available.

If the words are unique, it becomes a variant of TSP?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T14:33:02+00:00

TSP isn’t a great way to think about this problem. Let n be the length of the text and m be the length of the query; assume n > m. The naive solution

best = infinity
for i = 1 to n
  for j = i to n
    all_found = true
    for k = 1 to m
      found = false
      for l = i to j
        if text[l] == query[k]
          found = true
      all_found = all_found || found
    if all_found && j - i < best
      best = j - i
      best_i = i
      best_j = j

is already polynomial-time at O(n³ m) for bounded-length words. Now let’s optimize.

First, hoist the inner loop via a hash set.

best = infinity
for i = 1 to n
  for j = i to n
    subtext_set = {}
    for l = i to j
      subtext_set = subtext_set union {text[l]}
    all_found = true
    for k = 1 to m
      all_found = all_found && query[k] in subtext_set
    if all_found && j - i < best
      best = j - i
      best_i = i
      best_j = j

The running time is now O(n³), or O(n³ log n) if we use a binary tree instead.

Observe now that it’s wasteful to recompute subtext_set when the upper bound increases by one.

best = infinity
for i = 1 to n
  subtext_set = {}
  for j = i to n
    subtext_set = subtext_set union {text[l]}
    all_found = true
    for k = 1 to m
      all_found = all_found && query[k] in subtext_set
    if all_found && j - i < best
      best = j - i
      best_i = i
      best_j = j

We’re at O(n² m). Now it seems wasteful to recheck the entire query when subtext_set is augmented by just one element: why don’t we just check that one, and remember how many we have to go?

query_set = {}
for k = 1 to m
  query_set = query_set union {query[k]}
best = infinity
for i = 1 to n
  subtext_set = {}
  num_found = 0
  for j = i to n
    if text[l] in query_set && text[l] not in subtext_set
      subtext_set = subtext_set union {text[l]}
      num_found += 1
    if num_found == m && j - i < best
      best = j - i
      best_i = i
      best_j = j

We’re at O(n²). Getting to O(n) requires a couple of insights. First, let’s look at how many query words each substring contains for the example

text = Bar has a computer at home. Bar
       1   2   3 4        5  6     7
query = Bar computer a

# j 1 2 3 4 5 6 7
i +--------------
1 | 1 1 2 3 3 3 3
2 | 0 0 1 2 2 2 3
3 | 0 0 1 2 2 2 3
4 | 0 0 0 1 1 1 2
5 | 0 0 0 0 0 0 1
6 | 0 0 0 0 0 0 1
7 | 0 0 0 0 0 0 1

This matrix has non-increasing columns and non-decreasing rows, and that’s true in general. We want to traverse the underside of the entries with value m, because further in corresponds to a longer solution. The algorithm is the following. If the current i, j have all of the query words, then increase i; otherwise, increase j.

With our current data structures, increasing j is fine but increasing i is not, because our data structures don’t support deletion. Instead of a set, we need to keep a multi-set and decrement num_found when the last copy of a query word disappears.

best = infinity
count = hash table whose entries are zero by default
for k = 1 to m
  count[query[k]] = -1
num_found = 0
i = 1
j = 0
while true
  if num_found == m
    if j - i < best
      best = j - i
      best_i = i
      best_j = j
    count[text[i]] -= 1
    if count[text[i]] == -1
      num_found -= 1
    i += 1
  else
    j += 1
    if j > n
        break
    if count[text[j]] == -1
      num_found += 1
    count[text[j]] += 1

We’ve arrived at O(n). The last asymptotically relevant optimization is to reduce the extra space usage from O(n) to O(m) by storing counts only for elements in the query. I’ll leave that one as an exercise. (Also, some more care must be taken to handle empty queries.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a problem here which i am trying to solve. The program is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply