We currently use List<T> to store events from a simulation project we are running. We need to optimise memory utilisation and the time it takes to process the events in order to derive certain key metrics.
We thought of moving the event log to a SQL Server Compact database table and then possibly use Linq to calculate the metrics. From your experience do you think it will be faster to use SQL Server Compact than C#’s built-in data structures or are we going to have issues?
Some ideas.
MSMQ (Microsoft Message Queue)
You can have a thread dequeueing off of MSMQ and updating metrics on the fly. If you need to store these events for later paroosal you can put them into the database as you dequeue them. MSMQ demonstrates much better scalability in these scenarios – especially when the publisher and subscriber have assymetric processing speeds; and binary data is being used (as SQL can get bogged down with allocating space for
VARBINARY, or allocating/splitting pages for indexes).The two other SQL scenarios are complimentary to this one – you can still use dequeueing to insert into SQL; to avoid any hiccups in your simulation while SQL allocates space.
You can side-step what @Aliostad said using this one, to a certain degree.
OLAP (Online Analytical Processing)
Sounds like you might benefit from from OLAP (cubes etc.). This will increase the overall runtime of your simulation but will improve the value of the data. Unfortunately this means forking out cash for one of the bigger SQL editions.
Stored Procedures
While Linq-to-SQL is great for ‘your average developer’ please keep away from it in scientific projects. There are a host of great tricks you can use in raw TSQL, in addition to being able to inspect the query plan. If you want the best possible performance plan your DB carefully and create stored procedures/UDFs to aggregate your data.
If you can only calculate some of the metrics in C#, do as much work in SQL before-hand – and then feel free to use Linq-to-SQL to grab the data.
Also remember if you are inserting off the end of a MSMQ you can agressively index, which will speed up your metric calculations without impacting your simulation.
I would only involve SQL if there is a real need for better memory utilization (i.e. you are actually running out of it).
Memory Mapped Files
This allows you to offset memory pressure onto disk; at a performance penalty if it needs to be ‘paged’ back in.
Overall
I could steer clear of Linq to define basic metrics – do it in SQL. MSMQ is without a doubt a huge winner in this case. Don’t overcomplicate the memory issue and keep it in .Net if you are not running out of memory.