Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 266205
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T22:50:09+00:00 2026-05-11T22:50:09+00:00

So I’ve got a column in a table that contains a string values (keywords

  • 0

So I’ve got a column in a table that contains a string values (keywords populated from a 3rd party tool). I’m working on an automated tool to identify clusters of similar values that could probably be normalized to a single value. For example, “Firemen”/”Fireman”, “Isotope”/”Asotope” or “Canine”/”Canines”.

An approach that calculates the levenshtein distance seems ideal except for the fact that it involves too much string manipulation/comparison and would probably make poor use of SQL indexes.

I’ve considered incrementally grouping by the Left(X) characters of the column, which is a not-so-bad way to maximize index use, but this approach is really only effective at finding words with differences at the very end of the word.

Anyone got some good ideas for solving this problem efficiently in SQL?

Note: I realize this question is very similar to (Finding how similar two strings are), but the distinction here is the need to do this efficiently in SQL.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T22:50:09+00:00Added an answer on May 11, 2026 at 10:50 pm

    If you are using SQL Server, you might look into using the SOUNDEX() function as in:

    ...
    where
       SOUNDEX("searchterm") = SOUNDEX(searchvaluefield)
    

    it is supposed to do Phonetic matching on the strings …

    Some odd examples … so it seems you could catch plurals by always appending the plural text to both sides, since multiple ‘s’s sound the same … 🙂

    select soundex('Canine'), soundex('Canines')
    go
    
    ----- ----- 
    C550  C552  
    
    1 Row(s) affected
    
    
    select soundex('Canine'), soundex('Caynyn')
    go
    
    ----- ----- 
    C550  C550  
    
    1 Row(s) affected
    
    
    select soundex('Canines'), soundex('Caniness')
    go
    
    ----- ----- 
    C552  C552  
    
    1 Row(s) affected
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 173k
  • Answers 173k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer According to thread, it is by design. The methods/properties decorated… May 12, 2026 at 2:41 pm
  • Editorial Team
    Editorial Team added an answer If you don't want to have to deal with the… May 12, 2026 at 2:41 pm
  • Editorial Team
    Editorial Team added an answer There are reserved characters, that have a reserved meanings, those… May 12, 2026 at 2:41 pm

Related Questions

this is what i have right now Drawing an RSS feed into the php,
I have text I am displaying in SIlverlight that is coming from a CMS
I have a French site that I want to parse, but am running into
In order to apply a triggered animation to all ToolTip s in my app,
So I'm getting a new job working with databases (Microsoft SQL Server to be

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.