Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7767509
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T15:41:46+00:00 2026-06-01T15:41:46+00:00

There’s a lot of software that will take a search string and find all

  • 0

There’s a lot of software that will take a search string and find all of the text in your database that contains it (MySQL’s WHERE MATCH('searchterm', string_column), Google, etc.), but is there a good algorithm for going the other way?

Say I have a list of search terms:

Toyota Prius, Toyota Tacoma, Honda Civic, Chevy Nova, Chevy Volt

And I have a string, like:

1962 Chevy Nova convertable

Is there a good algorithm where I can put the list and the string in, and get Chevy Nova out?

If they’re all easily tokenized, I could tokenize them and do an inner join, but I’m interested in the case where I can’t tell which part of the input string is the “important” part.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T15:41:47+00:00Added an answer on June 1, 2026 at 3:41 pm

    if you’re tokenizing the “1962 Chevy Nova convertable” [sic] you’ll end up with four tokens that are all important or interesting enough to care about. if you’re keeping track of all of the possible words in your language, you’ll have an index for each of those words.

    and on the other hand, you’ve got your search terms. in each of those cases, you’ve tokenized and indexed the interesting words. each of those can be though of as a pair of two token indexes.

    then if you take your input and look for search terms that match, you’ll be asking which of the search terms have any of the words of the input?

    since I’m a database guy at heart, I can imagine creating the token list like so:

    CREATE TABLE aa_tokens (
      id INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
      word VARCHAR( 40 ) NOT NULL 
    );
    
    insert into aa_tokens (word) values
      ('1962'),           -- 1
      ('Chevy'),          -- 2
      ('Civic'),          -- 3
      ('Honda'),          -- 4
      ('Nova'),           -- 5
      ('Prius'),          -- 6
      ('Tacoma'),         -- 7
      ('Toyota'),         -- 8
      ('Volt'),           -- 9
      ('convertable');    -- 10
    

    and a table of searches so that each can have an id:

    CREATE TABLE aa_search (
      id INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
      text VARCHAR( 255 ) NOT NULL
    );
    
    insert into aa_search (text) values
      ('Toyota Prius'),   -- 1
      ('Toyota Tacoma'),  -- 2
      ('Honda Civic'),    -- 3
      ('Chevy Nova'),     -- 4
      ('Chevy Volt');     -- 5
    

    and then a table combining the searches and tokens:

    CREATE TABLE aa_searchToks (
      search INT NOT NULL,
      token INT NOT NULL
    );
    
    insert into aa_searchToks (search, token) values
      (1, 8),
      (1, 6),
      (2, 8),
      (2, 7),
      (3, 4),
      (3, 3),
      (4, 2),
      (4, 5),
      (5, 2),
      (5, 9);
    

    now if we take the input string “1962 Chevy Nova convertable” and turn it into tokens (1, 2, 5, 10), we can make a query that looks at the tokens of the search terms:

    select search, count(*) from aa_searchToks
      where token in (1, 2, 5, 10) group by search;
    

    the result of which is:

    +--------+----------+
    | search | count(*) |
    +--------+----------+
    |      4 |        2 |
    |      5 |        1 |
    +--------+----------+
    

    or querying a little bit differently:

    select search, (select text from aa_search s where st.search = s.id) as text, 
      count(*) from aa_searchToks st where token in (1, 2, 5, 10) group by search;
    

    resulting in:

    +--------+------------+----------+
    | search | text       | count(*) |
    +--------+------------+----------+
    |      4 | Chevy Nova |        2 |
    |      5 | Chevy Volt |        1 |
    +--------+------------+----------+
    

    we can see that “Chevy Nova” matches two tokens and is the best match, which, of course, it is.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

There are a lot of blogs saying that a hasOwnProperty check should be used
There will be 500+ threads concurrently uploading an unique object to a bucket all
There are many string matching algorithms can be used to find a pattern (string)
I know there's a lot of other questions out there that deal with this
Let's say I'm outputting a post title and in our database, it's Hello Y’all
There is a moment in my app, that I need to force to show
There is a column that exists in 2 tables. In table 1, this column
There's a Rails 3.2.3 web application which doesn't use any database. But in spite
For some reason, after submitting a string like this Jack’s Spindle from a text
I've got a string that has curly quotes in it. I'd like to replace

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.