I am writing a function to increase the time scale of raw calculation data with a time density of about two minutes to five minutes(and other larger scales after). There are over 100k data points held in an array that isn’t in chronological order. I am looking for the fastest way to query the array and to find data within two datetimes. As the code runs every data point will need to be used only once, but will have to be read several times as the data is not in order. I have several ideas of how to do this:
Just look at all of the time values in the array to check whether they are within the two datetimes given. This will force the code to run through the entire array for each new time point ~50k times.
Create a boolean in the array with my timedata that will become true if the value has been used. This will use a boolean check of the point has been used before the datetime comparison which should be faster.
Reorganize the array into order, I am not sure how long this would take based on datetimes. It would greatly increase the time required to import data in the first place, however it could make the scaling query much faster. Any idea on vaguely the ratio of time it would take to reorder the array compared to running it out of order?
Any other suggestions are welcome.
I will add some code if people feel it is necessary. Thanks in advance.
EDIT: A few examples as requested.
Here are the definitions of the arrays.:
Dim ScaleDate(0) As Date
Dim ScaleData(0) As Double
I use redim preserve as the data is added to them with an SQL.
Here is an example of a datetime point copied from the array.
(0) = #2/12/2012 12:01:36 AM#
First, as Tim Schmelter recommended, I would use a
List(Of T)instead of an array. It will likely be more efficient and will definitely be easier to work with. Second, I would recommend defining your own type which stores all the data for a single item rather than storing each property for the item in a separate list. Doing so will make it easier to modify in the future, but it will also be more efficient because you’ll only need to resize one list rather than two:It’s hard to say which will be faster, sorting the list or searching through it. It all depends how big it is, how often it’s changed, and how often you search it. So, I would recommend trying both options and seeing for yourself which works better in your scenario.
For sorting, if you have your own type, you could simply make it implement
IComparable(Of T)and then call theSortmethod on the list:You’d want to only sort the list once each time it is modified. You wouldn’t want to sort it every time you search through the list. Also, sorting it, in and by itself, doesn’t make searching it front-to-back any faster, so you would want to implement a find method which quickly searches through a sorted list. For instance, something along these lines should work:
For searching through an unsorted list, I simply loop through front-to-back on the whole list and I would create a new list of all the matching items:
Alternatively, if the dates on these items are mostly sequential without giant gaps between them, it may be worth using a
Dictionary(Of Date, List(Of MyItem))object to store your list of items. This would contain separate lists of items for each date, all stored in a hash table. So, to get or set a list of items for a particular day would be very fast, but to get a list of all the items in a date range, you’d have to loop through every day in the date range and get the list for that day from the dictionary and combine them into one list of matches: