I’m designing an interface where the user can join a publicaiton to a keyword, and when they do, I want to suggest other keywords that commonly occur in tandem with the selected keyword. The trick is getting the frequency of correlation alongside the properties of the suggested keywords.
The Keyword type (EF) has these fields:
int Id
string Text
string UrlString
…and a many-to-many relation to a Publications entity-set.
I’m almost there. With :
var overlappedKeywords =
selectedKeyword.Publications.SelectMany(p => p.Keywords).ToList();
Here I get something very useful: a flattened list of keywords, each duplicated in the list however many times it appears in tandem with selectedKeyword.
The remaining Challenge:
So I want to get a count of the number of times each keyword appears in this list, and project the distinct keyword entities onto a new type, called KeywordCounts, having the same fields as Keyword but with one extra field: int PublicationsCount, into which I will populate the count of each Keyword within overlappedKeywords. How can I do this??
So far I’ve tried 2 approaches:
var keywordCounts = overlappingKeywords
.Select(oc => new KeywordCount
{
KeywordId = oc.Id,
Text = oc.Text,
UrlString = oc.UrlString,
PublicationsCount = overlappingKeywords.Count(ok2 => ok2.Id == oc.Id)
})
.Distinct();
…PublicationsCount is getting populated correctly, but Distinct isn’t working here. (must I create an EqualityComarer for this? Why doesn’t the default EqualityComarer work?)
var keywordCounts = overlappingKeywords
.GroupBy(o => o.Id)
.Select(c => new KeywordCount
{
Id = ???
Text = ???
UrlString = ???
PublicationsCount = ???
})
I’m not very clear on GroupBy. I don’t seem to have any access to ‘o’ in the Select, and c isn’t comping up with any properties of Keyword
UPDATE
My first approach would work with a simple EqualityComparer passed into .Distinct() :
class KeywordEqualityComparer : IEqualityComparer<KeywordCount>
{
public bool Equals(KeywordCount k1, KeywordCount k2)
{
return k1.KeywordId== k2.KeywordId;
}
public int GetHashCode(KeywordCount k)
{
return k.KeywordId.GetHashCode();
}
}
…but Slauma’s answer is preferable (and accepted) because it does not require this. I’m still stumped as to what the default EqualityComparer would be for an EF entity instance — wouldn’t it just compare based on primary ids, as I did above here?
You second try is the better approach. I think the complete code would be:
This is LINQ to Objects, I guess, because there doesn’t seem to be a EF context involved but an object
overlappingKeywords, so the grouping happens in memory, not in the database.