Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5969017
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T20:10:16+00:00 2026-05-22T20:10:16+00:00

How could i get a sound similarity rating for a string written in one

  • 0

How could i get a sound similarity “rating” for a string written in one language with another string in another language: i.e an algorithm that will identify that

“David Letterman” and “דוד לטרמן” are strings that sound alike.

-Oh, yes, btw the above is Hebrew for, you guessed it: “David Letterman”, and it sounds/spoken almost the same as in English..

The only raw material I have is strings in unicode in their respective languages.
That is, i do not have phonemes or phonetic transcriptions/translations of the strings.

I Have already implemented a Soundex implementation tweak kinda thing, which works so-so. Is this the way to go?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T20:10:17+00:00Added an answer on May 22, 2026 at 8:10 pm

    Soundex may not be perfect, but it seems like a reasonable approach, at least for your specific example of English/Hebrew matching.

    You definitely can’t use the rule about preserving the first letter of the name, but I never liked that even for the Latin alphabet (because I’d have to look under both “E” and “Y” for my mother’s family name). I recommend just treating the first letter like all the others.

    Then it’s just a matter of mapping the Hebrew letters to Soundex codes. You don’t really need an intermediate English transliteration; just code the Hebrew → Soundex mapping directly.

    • בוףפ → 1
    • גזחךכסקש → 2
    • דטת → 3
    • ץצ → 32
    • ל → 4
    • םמןנ → 5
    • ר → 6
    • אהיע → ignored

    However, because Soundex is English-centric, it may not correctly handle certain ambiguities in the Hebrew pronunciation:

    • ו is mapped to 1 (like English V) in the list above, but it often represents O, U, or W, in which case it should be ignored in Soundex.
    • ח is hard to classify due to its lack of an English equivalent. I put it in category 2 because this (1) matches the “ch” transliteration, and (2) allows ך/כ to have the same category with or without a dagesh.
    • Ashkenazi pronuncation would split ת between categories 2 and 3.

    To deal with this, you could generate multiple Soundex keys for a string. E.g., “שבת” would map to both 212 and 213.

    Similar mappings can be made for Greek:

    • ΒΠΦ → 1
    • Ψ → 12
    • ΓΖΚΞΣΧ → 2
    • ΔΘΤ → 3
    • Λ → 4
    • ΜΝ → 5
    • Ρ → 6
    • ΑΕΗΙΟΥΩ → ignored

    or Russian:

    • БВПФ → 1
    • ГЖЗКСХЧШЩ → 2
    • ДТ → 3
    • Ц → 32
    • Л → 4
    • МН → 5
    • Р → 6
    • АЕЁИЙОУЪЫЬЭЮЯ → ignored

    (Note that some of the 2’s might be 32’s, depending on your transliteration convention.)


    A similarity “rating” can be obtained based on a metric like longest common subsequence length or Levenshtein distance on the Soundex values.

    For example, you can define the “similarity” between two strings as 2*lcslen(A, B)/(len(A)+len(B)) to obtain a score between 0 and 1.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Was hoping I could get a bit of a hand on this login that
Trying to write a PowerShell cmdlet that will mute the sound at start, unless
I could get the keyboard to work but on a UITextView but some how
Wondering if I could get some advice and direction on this following requirement: Need
hi im doing a loop so i could get dict of data, but since
Can anyone point me in the direction of how I could get a NUnit
I'm new here and I hope I could get some help with an Android
I need to do the following and i was wondering if i could get
I'm not sure this question is appropriate here but I hope I could get
Could i get the apache mod_rewrite definition of urlbase via php scripting?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.