I am writing an application in WPF. I am using Entity Framework 5 and was wondering if you can give me advice about how to handle the following situation.
I have basically only three tables:
Item {ID, Name}
Attribute {ID, Name, Type}
AttributeValue {ItemID, AttributeID, Value}
Our client wants to add attributes to his items and this seems to work quite well. Now he can add attributes and assign them with values to his items. First question: To you think this is good design?
The question is now how to display the items in a WPF DataGrid and show all the attributes as columns. This is what we have now:
We generate the columns programmatically:
foreach (var attribute in db.Attributes)
{
datagrid.Columns.Add(new DataGridTextColumn
{
Header = attribute.Name,
Binding = new Binding { Path = new PropertyPath(string.Format("[{0}]", attribute.ID)) }
});
}
This works fine. The question now is how to implement the indexer in the Items-Class. We have one option which gets quite slow with large amount of data (20.000 items, 400 Attributes, every item has 100 values):
public AttributeValue this[int i]
{
get
{
return AttributeValues.FirstOrDefault(aa => aa.AttributID == i);
}
}
Like i said, this works but it gets slow. Instead of always querying for the attributevalues i thought of caching everything before showing like this:
var items = db.Items.AsNoTracking().ToArray();
CachedValues.Values = new Dictionary<int, Dictionary<int, AttributeValue>>(items.Length);
foreach (var item in items)
{
var attributevalues = db.AttributeValue.AsNoTracking().Where(w => w.ArtikelID == item.ID).ToArray();
CachedValues.Values[item.ID] = new Dictionary<int, AttributeValue>(attributevalues.Length);
foreach (var value in attributevalues)
{
CachedValues.Values[item.ID][value.AttributeID] = value;
}
}
I use a static class as the Cache:
public static class CachedValues
{
public static Dictionary<int, Dictionary<int, ArtikelAttribut>> Values;
}
And in the Items-Class I can then access the cache:
public AttributeValue this[int i]
{
get
{
AttributeValue val = null;
CachedValues.Values[ID].TryGetValue(i, out val);
return val;
}
}
Obviously it takes some time (15s) to initialize the cache but then it’s a lot faster. Sorting the Items by a attribute in the datagrids only takes a second. With the other approach it tooks ages.
I’m not satisfied with the solution, do you have any suggestions? I would apprecciate any kind of criticism (I know both are not good solutions).
Thanks,
Thomas
EDIT
To make the first question more clear an small example:
Items: Item1,Item2,Item3,…
Attributes: Width, Height, Speed, …
AttributeValues: (Item1, Width, 100), (Item1,Height,200), (Item2,Width,100), (Item3, Height, 200), (Item3,Speed,40)
So it’s a classic many-to-many relationship. A attribute might appear in 0-many Items and and Item might have 0-many attributes.
You data model is a very generic solution and there are arguments for and against it but it is hard to judge without more background so I will just skip this part of the question.
I would probably have implemented a similar cache as you did, but with some slight modifications.
I would not use a global static cache but instead one dictionary per entity caching only property values for this entity. This way you do not have to care about removing entries from the global cache if an entity goes out of scope because the cache gets garbage collected together with the entity.
I would not prepopulate the dictionary but instead first try to get the value from the cache and if it is not present search the property value list and update the cache on demand. If the application is well behaved and does not access all values at once – for example because only some data is visible in a grid at any time while other rows are not visible – this could seamlessly distribute the cache population process over an extended period of time and make it unnoticeable to the user.
Final thought – 20,000 item with 400 attributes each? Which user is supposed to be able to work with an application presenting up to 8,000,000 values at once? Maybe you should also consider redesigning the user interface and interaction logic – nobody needs and can handle that much information at once.