I’m writing an application in C# using .NET 4.5. My application reads from and writes to a binary file which is consumed by a hardware device. The file is exactly 5,000,000 bytes and consists of 100,000 50-byte “blocks”. Each 50 byte block will contain an ASCII-encoded string (possibly just an empty string). The file layout is required by the device so that any string can be accessed simply by determining the offset (index * 50 bytes and then reading/writing the next 50 bytes).
My WinForms application needs to be able to:
- Load all 100,000 strings from the file and display them in the UI (listbox)
- Add, edit or delete strings
- Re-write the binary file in the proper format
Caveats:
- A fair amount of the data in the binary file will initially be empty strings that I don’t want to display to the user but will still need to be in my data structure so I can re-write the file properly.
- Before adding a new string, I need to make sure that there’s free space to allow it (i.e., at least one 50 byte block that’s empty). If not, then the user will need to “delete” an existing string to make room for the new string to be added.
- Adding, modifying and deleting strings are, behind the scenes, the same operation. Adding just changes 50 bytes of null data to 50 bytes of string data while deleting represents the reverse.
- With the exception of empty strings, duplicate strings shouldn’t be stored so I need some way to iterate through my data structure to ensure that a string doesn’t already exist before adding it or editing a string to match an existing string.
So I’m going to need a data structure to hold all of the data from the file and I’m struggling to decide between a dictionary, a list or an array. Given the caveats above, I don’t think directly binding any data structure to the UI is a viable solution. So I think that I’m going to need a lot of code between that data structure and the listbox to implement some kind of pseudo-binding. If that’s the case, which data structure (dictionary, list, array, something else) would be the most useful from a functional standpoint and offer the best compromise re: speed with a dataset of this size (100,000 strings of up to 50 ASCII character each)?
Even with 100,000 entries, I don’t think you’re going to need to worry too much about performance. (EDIT: I mean, performance when it comes to I/O or alteration of the data. You likely will have issues with your GUI binding)
Program it the simplest way that makes it easiest for you to make changes to your business logic and GUI binding. Perhaps even consider creating your own class that simply wraps a
List(orArrayor whatever) with a standard public interface for your GUI that hides this implementation detail.Once your GUI is running and all your I/O is running fine (and ideally, have some nice suite of unit tests), then you can start benchmarking/profiling and finding where your bottlenecks are.
EDIT: Considering your requirements, this custom/wrapped data structure might be ideal. It can proactively know certain aspects of your data. For example, as it is loaded/read, it can check if space is available for new strings, so when the user goes to add more, you already know if you can. It can maintain a
HashSetof used unique strings so you have a nice O(1) lookup for duplicate strings, and so on.