Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 238897
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T20:32:32+00:00 2026-05-11T20:32:32+00:00

Given a list L of n character strings, and an input character string S,

  • 0

Given a list L of n character strings, and an input character string S, what is an efficient way to find the character string in L that contains the most characters that exist in S? We want to find the string in L that is most-closely made up of the letters contained in S.

The obvious answer is to loop through all n strings and check to see how many characters in the current string exist in S. However, this algorithm will be run frequently, and the list L of n string will be stored in a database… loop manually through all n strings would require something like big-Oh of n*m^2, where n is the number of strings in L, and m is the max length of any string in L, as well as the max length of S… in this case m is actually a constant of 150.

Is there a better way than just a simple loop? Is there a data structure I can load the n strings into that would give me fast search ability? Is there an algorithm that uses the pre-calculated meta-data about each of the n strings that would perform better than a loop?

I know there are a lot of geeks out there that are into the algorithms. So please help!

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T20:32:32+00:00Added an answer on May 11, 2026 at 8:32 pm

    If you are after substrings, a Trie or Patrica trie might be a good starting point.

    If you don’t care about the order, just about the number of each symbol or letter, I would calculate the histogram of all strings and then compare them with the histogram of the input.

                   ABCDEFGHIJKLMNOPQRSTUVWXYZ
    Hello World => ...11..1...3..2..1....1...
    

    This will lower the costs to O(26 * m + n) plus the preprocessing once if you consider only case-insensitive latin letters.

    If m is constant, you could interpret the histogram as a 26 dimensional vector on a 26 dimensional unit sphere by normalizing it. Then you could just calculate the Dot Product of two vectors yielding the cosine of the angle between the two vectors, and this value should be proportional to the similarity of the strings.

    Assuming m = 3, a alphabet A = { 'U', 'V', 'W' } of size three only, and the following list of strings.

    L = { "UUU", "UVW", "WUU" }
    

    The histograms are the following.

    H = { (3, 0, 0), (1, 1, 1), (2, 0, 1) }
    

    A histogram h = (x, y, z) is normalized to h' = (x/r, y/r, z/r) with r the Euclidian norm of the histogram h – that is r = sqrt(x² + y² + z²).

    H' = { (1.000, 0.000, 0.000), (0.577, 0.577, 0.577), (0.894, 0.000, 0.447) }
    

    The input S = "VVW" has the histogram hs = (0, 2, 1) and the normalized histogram hs' = (0.000, 0.894, 0.447).

    Now we can calculate the similarity of two histograms h1 = (a, b, c) and h2 = (x, y, z) as the Euclidian distance of both histograms.

    d(h1, h2) = sqrt((a - x)² + (b - y)² + (c - z)²)
    

    For the example we obtain.

    d((3, 0, 0), (0, 2, 1)) = 3.742
    d((1, 1, 1), (0, 2, 1)) = 1.414
    d((2, 0, 1), (0, 2, 1)) = 2.828
    

    Hence “UVW” is closest to “VVW” (smaller numbers indicate higher similarity).

    Using the normalized histograms h1' = (a', b', c') and h2' = (x', y', z') we can calculate the distance as the dot product of both histograms.

    d'(h1', h2') = a'x' + b'y' + c'z'
    

    For the example we obtain.

    d'((1.000, 0.000, 0.000), (0.000, 0.894, 0.447)) = 0.000
    d'((0.577, 0.577, 0.577), (0.000, 0.894, 0.447)) = 0.774
    d'((0.894, 0.000, 0.447), (0.000, 0.894, 0.447)) = 0.200
    

    Again “UVW” is determined to be closest to “VVW” (larger numbers indicate higher similarity).

    Both version yield different numbers, but the results are always the same. One could also use other norms – Manhattan distance (L1 norm) for example – but this will only change the numbers because norms in finite dimensional vector spaces are all equivalent.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 112k
  • Answers 112k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer You cant do x-domain ajax calls successfully cross browser. You… May 11, 2026 at 9:58 pm
  • Editorial Team
    Editorial Team added an answer You need to escape the parentheses use: var regExp:RegExp =… May 11, 2026 at 9:58 pm
  • Editorial Team
    Editorial Team added an answer Why don't you put the views into an array and… May 11, 2026 at 9:58 pm

Related Questions

I'm using Python to infinitely iterate over a list, repeating each element in the
How would you describe the running time of this sort, given a function sorted
I've been slowly working my way through the list of Project Euler problems, and
I'm trying to write a piece of code that will do the following: Take
Given a path expressed as an array of 2d points: Point[] path = new

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.