Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4054244
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T14:33:01+00:00 2026-05-20T14:33:01+00:00

I have a problem here which i am trying to solve. The program is

  • 0

I have a problem here which i am trying to solve.

The program is given a text file containing the following characters: a – z, A – Z, 0 – 9, fullstop (.) and space . Words in the text file are purely made up of a-z, A-Z and 0-9 . The program receives several queries. Each query is made up of a set of full words already present in the file. The program should return the smallest phrase from the file where all words are present (in any order) . If there are many such phrases, return the first one.

Here is an example. Let us say that the file contains:

Bar is doing a computer science degree. Bar has a computer at home. Bar  is now at home.

Query 1 :

Bar computer a

Response:

Bar has a computer

Query 2:

Bar home

Response:

home. Bar

I thought of this solution. For query 1, Bar is searched first and all three occurences of Bar is assembled as a list. Each node in list also contain the starting position of the smallest phrase and the total length. So it’ll look like

1st node “Bar, 0, 1” [Query, starting posn, total length].
Similarly for 2nd and 3rd node.

Next computer is searched for. The minimum distance of computer for each occurence of Bar is calculated.

1st node “Bar Computer”, 0, 5

2nd node “Bar Computer”, 7 , 4 and so on for other nodes

Next “a” is searched for. The search has to start from the starting position that is mentioned each node and has to be traversed left and right until the word is found as order is unimportant. The minimum of the occurence has to be chosen.

Is this solution on right track? I feel that doing this way, i have to be wary of many cases and there might be a simpler solution available.

If the words are unique, it becomes a variant of TSP?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T14:33:02+00:00Added an answer on May 20, 2026 at 2:33 pm

    TSP isn’t a great way to think about this problem. Let n be the length of the text and m be the length of the query; assume n > m. The naive solution

    best = infinity
    for i = 1 to n
      for j = i to n
        all_found = true
        for k = 1 to m
          found = false
          for l = i to j
            if text[l] == query[k]
              found = true
          all_found = all_found || found
        if all_found && j - i < best
          best = j - i
          best_i = i
          best_j = j
    

    is already polynomial-time at O(n3 m) for bounded-length words. Now let’s optimize.

    First, hoist the inner loop via a hash set.

    best = infinity
    for i = 1 to n
      for j = i to n
        subtext_set = {}
        for l = i to j
          subtext_set = subtext_set union {text[l]}
        all_found = true
        for k = 1 to m
          all_found = all_found && query[k] in subtext_set
        if all_found && j - i < best
          best = j - i
          best_i = i
          best_j = j
    

    The running time is now O(n3), or O(n3 log n) if we use a binary tree instead.

    Observe now that it’s wasteful to recompute subtext_set when the upper bound increases by one.

    best = infinity
    for i = 1 to n
      subtext_set = {}
      for j = i to n
        subtext_set = subtext_set union {text[l]}
        all_found = true
        for k = 1 to m
          all_found = all_found && query[k] in subtext_set
        if all_found && j - i < best
          best = j - i
          best_i = i
          best_j = j
    

    We’re at O(n2 m). Now it seems wasteful to recheck the entire query when subtext_set is augmented by just one element: why don’t we just check that one, and remember how many we have to go?

    query_set = {}
    for k = 1 to m
      query_set = query_set union {query[k]}
    best = infinity
    for i = 1 to n
      subtext_set = {}
      num_found = 0
      for j = i to n
        if text[l] in query_set && text[l] not in subtext_set
          subtext_set = subtext_set union {text[l]}
          num_found += 1
        if num_found == m && j - i < best
          best = j - i
          best_i = i
          best_j = j
    

    We’re at O(n2). Getting to O(n) requires a couple of insights. First, let’s look at how many query words each substring contains for the example

    text = Bar has a computer at home. Bar
           1   2   3 4        5  6     7
    query = Bar computer a
    
    # j 1 2 3 4 5 6 7
    i +--------------
    1 | 1 1 2 3 3 3 3
    2 | 0 0 1 2 2 2 3
    3 | 0 0 1 2 2 2 3
    4 | 0 0 0 1 1 1 2
    5 | 0 0 0 0 0 0 1
    6 | 0 0 0 0 0 0 1
    7 | 0 0 0 0 0 0 1
    

    This matrix has non-increasing columns and non-decreasing rows, and that’s true in general. We want to traverse the underside of the entries with value m, because further in corresponds to a longer solution. The algorithm is the following. If the current i, j have all of the query words, then increase i; otherwise, increase j.

    With our current data structures, increasing j is fine but increasing i is not, because our data structures don’t support deletion. Instead of a set, we need to keep a multi-set and decrement num_found when the last copy of a query word disappears.

    best = infinity
    count = hash table whose entries are zero by default
    for k = 1 to m
      count[query[k]] = -1
    num_found = 0
    i = 1
    j = 0
    while true
      if num_found == m
        if j - i < best
          best = j - i
          best_i = i
          best_j = j
        count[text[i]] -= 1
        if count[text[i]] == -1
          num_found -= 1
        i += 1
      else
        j += 1
        if j > n
            break
        if count[text[j]] == -1
          num_found += 1
        count[text[j]] += 1
    

    We’ve arrived at O(n). The last asymptotically relevant optimization is to reduce the extra space usage from O(n) to O(m) by storing counts only for elements in the query. I’ll leave that one as an exercise. (Also, some more care must be taken to handle empty queries.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm struggling here with a problem: I have a controller questions which has action
Here's my problem: I have a virtual method defined in a .h file that
Here's my problem.I have 2 xmlfiles with identical structure, with the second xml containing
I'm having a strange problem here... I have an ASP.NET 3.5 application that has
I'm having a bit of a problem here. We have 2 urls let me
Well here's my problem I have three tables; regions, countries, states. Countries can be
Here is the problem I have: I have a lot (tens of thousands) of
Here's my problem - I have some code like this: <mx:Canvas width=300 height=300> <mx:Button
Here is the problem: we have lots of Javascripts and lots of CSS files,
Here's my problem: I have to call a web service with a secure header

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.