Is there a way through the .Net aframework (or has someone written something similar) to get an array of matches when being passed a string and a dictionary object?
First some background
I need to
I have csv file of sports teams which I load into a dictionary object, for example…
Team, Variants
Manchester United, Manchester United
Manchester United, manutd
Manchester United, man utd
Manchester United, manchester utd
Manchester United, mufc
Aston Villa, Aston Villa
Aston Villa, avfc
Newcastle United, Newcastle United
Newcastle United, toon army
Now I want to see if a string contains any of the phrases in that dictionary.
An example string…
"I wonder if man utd, aston villa andthe toon army will exist in this string"
Now I want to return is n array of strings that match, an example output would be the following:
["Manchester United","Aston Villa", "Newcastle United"]
I am currently using regex to split the words in the string. Then I am then looping through each word in the string and testing that against the dictionary (A note here is that the code does work but only single words not phrases and that’s due to the regex)
public static List<string> CheckStringWithDictionary(string input, Dictionary<string, string> dic, int minimumLength)
{
List<string>lst = new List<string>();
string myValue = "";
foreach (Match match in RegexSplitStringToArray(input, minimumLength))
{
if (dic.TryGetValue(match.Value, out myValue))
lst.Add(myValue);
}
return lst;
}
public static MatchCollection RegexSplitStringToArray(string input, int minLength)
{
Regex csvSplit = new Regex("(\\w{3,})", RegexOptions.Compiled);
return csvSplit.Matches(input);
}
The reason for looping the string rather than the dictionary is because the dictionary will contain over 10,000+ items and therefore be very inefficient in terms looping through that.
Thanks for your patiece so far, and now to the question…
Is there a way through the .Net aframework (or has someone written something similar) to get an array of matches when being passed a string and a dictionary object?
Thanks all
I would use LINQ for this:
This is still effectively looping through the dictionary, but that’s likely going to be better than most alternatives, even with a large dictionary, if you need to handle the multiple word options.