The code you have just call the copy constructor, this…

Question

0

Editorial Team

Asked: May 10, 20262026-05-10T14:46:58+00:00 2026-05-10T14:46:58+00:00

Hey, I’m using Levenshteins algorithm to get distance between source and target string. also

0

Hey, I’m using Levenshteins algorithm to get distance between source and target string.

also I have method which returns value from 0 to 1:

/// <summary> /// Gets the similarity between two strings. /// All relation scores are in the [0, 1] range,  /// which means that if the score gets a maximum value (equal to 1)  /// then the two string are absolutely similar /// </summary> /// <param name='string1'>The string1.</param> /// <param name='string2'>The string2.</param> /// <returns></returns> public static float CalculateSimilarity(String s1, String s2) {     if ((s1 == null) || (s2 == null)) return 0.0f;      float dis = LevenshteinDistance.Compute(s1, s2);     float maxLen = s1.Length;     if (maxLen < s2.Length)         maxLen = s2.Length;     if (maxLen == 0.0F)         return 1.0F;     else return 1.0F - dis / maxLen; }

but this for me is not enough. Because I need more complex way to match two sentences.

For example I want automatically tag some music, I have original song names, and i have songs with trash, like super, quality, years like 2007, 2008, etc..etc.. also some files have just http://trash..thash..song_name_mp3.mp3, other are normal. I want to create an algorithm which will work just more perfect than mine now.. Maybe anyone can help me?

here is my current algo:

/// <summary> /// if we need to ignore this target. /// </summary> /// <param name='targetString'>The target string.</param> /// <returns></returns> private bool doIgnore(String targetString) {     if ((targetString != null) && (targetString != String.Empty))     {         for (int i = 0; i < ignoreWordsList.Length; ++i)         {             //* if we found ignore word or target string matching some some special cases like years (Regex).             if (targetString == ignoreWordsList[i] || (isMatchInSpecialCases(targetString))) return true;         }     }     return false; }  /// <summary> /// Removes the duplicates. /// </summary> /// <param name='list'>The list.</param> private void removeDuplicates(List<String> list) {     if ((list != null) && (list.Count > 0))     {         for (int i = 0; i < list.Count - 1; ++i)         {             if (list[i] == list[i + 1])             {                 list.RemoveAt(i);                 --i;             }         }     } }  /// <summary> /// Does the fuzzy match. /// </summary> /// <param name='targetTitle'>The target title.</param> /// <returns></returns> private TitleMatchResult doFuzzyMatch(String targetTitle) {     TitleMatchResult matchResult = null;     if (targetTitle != null && targetTitle != String.Empty)    {        try        {            //* change target title (string) to lower case.            targetTitle = targetTitle.ToLower();             //* scores, we will select higher score at the end.            Dictionary<Title, float> scores = new Dictionary<Title, float>();             //* do split special chars: '-', ' ', '.', ',', '?', '/', ':', ';', '%', '(', ')', '#', '\'', '\'', '!', '|', '^', '*', '[', ']', '{', '}', '=', '!', '+', '_'            List<String> targetKeywords = new List<string>(targetTitle.Split(ignoreCharsList, StringSplitOptions.RemoveEmptyEntries));            //* remove all trash from keywords, like super, quality, etc..            targetKeywords.RemoveAll(delegate(String x) { return doIgnore(x); });           //* sort keywords.           targetKeywords.Sort();         //* remove some duplicates.         removeDuplicates(targetKeywords);          //* go through all original titles.         foreach (Title sourceTitle in titles)         {             float tempScore = 0f;             //* split orig. title to keywords list.             List<String> sourceKeywords = new List<string>(sourceTitle.Name.Split(ignoreCharsList, StringSplitOptions.RemoveEmptyEntries));             sourceKeywords.Sort();             removeDuplicates(sourceKeywords);              //* go through all source ttl keywords.             foreach (String keyw1 in sourceKeywords)             {                 float max = float.MinValue;                 foreach (String keyw2 in targetKeywords)                 {                     float currentScore = StringMatching.StringMatching.CalculateSimilarity(keyw1.ToLower(), keyw2);                     if (currentScore > max)                     {                         max = currentScore;                     }                 }                 tempScore += max;             }              //* calculate average score.             float averageScore = (tempScore / Math.Max(targetKeywords.Count, sourceKeywords.Count));               //* if average score is bigger than minimal score and target title is not in this source title ignore list.             if (averageScore >= minimalScore && !sourceTitle.doIgnore(targetTitle))             {                 //* add score.                 scores.Add(sourceTitle, averageScore);             }         }          //* choose biggest score.         float maxi = float.MinValue;         foreach (KeyValuePair<Title, float> kvp in scores)         {             if (kvp.Value > maxi)             {                 maxi = kvp.Value;                 matchResult = new TitleMatchResult(maxi, kvp.Key, MatchTechnique.FuzzyLogic);             }         }     }     catch { } } //* return result. return matchResult; }

This works normally but just in some cases, a lot of titles which should match, does not match… I think I need some kind of formula to play with weights and etc, but i can’t think of one..

Ideas? Suggestions? Algos?

by the way I already know this topic (My colleague already posted it but we cannot come with a proper solution for this problem.): Approximate string matching algorithms

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T14:46:59+00:00

2026-05-10T14:46:59+00:00Added an answer on May 10, 2026 at 2:46 pm

Your problem here may be distinguishing between noise words and useful data:

Rolling_Stones.Best_of_2003.Wild_Horses.mp3
Super.Quality.Wild_Horses.mp3
Tori_Amos.Wild_Horses.mp3

You may need to produce a dictionary of noise words to ignore. That seems clunky, but I’m not sure there’s an algorithm that can distinguish between band/album names and noise.

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions