I have an IEnumerable full of objects that we are using to represent user actions. This is for the ultimate goal of displaying a list of the most recent actions taken in the system. This list can get rather long, and the users have requested a 24 hour period for the list. I want to perform some “squashing” on this list somewhat like what Facebook does for likes and comments. For example instead of listing all 37 updates a specific user performed I can list that user x updated 37 y.
These objects have the username and the datetime for the action taken as an attribute, so that information is easy enough to select. I need some help with the best way to programmatically determine what should be squashed. Ideally I am thinking for example if 1000+ people are updated in our system in less than 10 minutes by the same user then its an import and not a manual edit, and I will remove those from the list of actions and replace it with “so and so ran an import”
How would I query an IEnumerable for the objects with the same username and within a specific date range?
Edit: The only thing I am able to initially think of is iterating over the Enumerable for each possible user and for each possible 10 minute time period. That just sounds horribly inefficient though, and I’m clearly just ignorant of the options available.
As it turns out I was approaching this problem incorrectly. After attempting to query the dataset in different ways using LINQ I realized that this was an AI problem. I was trying to identify groups of data within a large dataset on a per user and time basis.
This is a clustering problem. I have written and published a library to perform K means clustering on objects in an IEnumerable. The process goes a little something like this:
The Cluster class containing two distance algorithms, the IClusterable interface and the KCluster algorithm are all provided in the C# Machine Learning Library