I’m writing a small program to read in a csv with a variable number of lines and have a question about best practices:
Is the best way to create storage for the data on each line to make an array that holds the data structures of the csv (one per each line of csv)?
Size allocated to the array could be set to a large number (for example, more lines than there would ever reasonably be in the csv)? I have seen this in many examples on the web.
Or… is there was a smart way to tell how much space would be needed such as counting the lines before hand or dynamically adding space by using a linked list as opposed to an array with static storage allocation. Any best practices? I don’t think choosing a random number seems very slick…
Any thoughts would be greatly appreciated.
If you can process the data as you read it rather than saving it all and processing after, this would eliminate the problem.
I avoid counting the lines first, as this requires reading the entire file twice. I suppose if the file is small the efficiency hit is not a big deal, but if you know that the file is small, then you could just allocate a big enough space.
So in general, my approach — if I can’t process the file a line at a time — is to use a data structure that can grow, like a linked list. Then for each line I just allocate a new block. Depending on what you’re up to, you might use a dynamic array: allocate an amount of space that ought to be enough for the normal case. If you fill it, allocate a bigger space, copy the first to the second, delete the first, and then continue working with the second. If you fill that, repeat the process. This can be a lot of data movement but the amount of space used in the end will be less than a linked list because you don’t have the pointers, and it will be faster to traverse because you’re not chasing pointers and possibly running all over virtual memory.