Say I have got some data series generated by my plotting application, and I want to store them and recall them in .csv at will.
Each data set has four characteristics. A name, a set of x values (xvals), a set of y values (yvals) and a parent.
I’m currently imagining a .csv file the kind that I would generate in Excel, and I’m thinking:
name, DataSet1
xvals, 1,2,3,4,5
yvals, 1,4,9,16,25
parent, None
<linebreak>
name, DataSet2
xvals, 1,2,3,4,5
yvals, 21,23,24,25,26
parent, None
<linebreak>
and so on. It doesn’t feel very natural, and the implementation is looking kind of ugly. Does anyone have any suggestions?
In my application, each DataSeries instance already contains all the data I need. If I could actuallly save the instance object itself (or a collection of them) that would work just as well for the meantime (although I eventually want to be able to export the data for usage in Excel)
I want to tell python:
-
read all the lines in the file. every time you read a “blank” line, insert a separator. consider each clump of lines a data set.
-
read the first item in each line of the package. this is the type of information the rest of the cells in the row contain. take that referral, and put the data in all the adjacent cells, as a list, into the corresponding object attribute.
I have a way to make this happen, but it involves a lot of awkward specific calls to characters and positions which reminds me of the “GOTO” statement. I want something more organic and Pythonic.
Current approach:
class DataSet(object):
def __init__(self, name, xvals, yvals, parent=None):
self.name = name
self.xvals = xvals
self.yvals = yvals
self.parent = parent
loaded_data = csv.reader(open('csv_data.csv', 'r'), delimiter=',')
container = []
dict = {}
for row in loaded_data:
if list(row)[0] == '':
container.append(dict)
dict = {}
else:
dict[list(row)[0]] = filter(None,list(row)[1:])
container.append(dict)
Make data set and parent into columns, so your data looks like this:
In general to make your data into a tabular format (like CSV), you need to restructure it as rows. If you have information that is associated not with a row but with some group of rows (e.g., the “data set name”), you should recast this is a column whose value is duplicated through the relevant rows. When you read in the data, you can easily filter on this column to get the relevant groups back.
Incidentally, you might want to look at pandas, a library that provides useful tools for dealing with tabular data (including reading and writing CSVs and grouping on column values in the way I described).
Edit: Based on your comments, it appears you’re not asking how to use CSV to store your data. You’re asking how to parse your ad hoc format. The answer to that is “write a parser yourself”; you could have a look at pyparsing. CSV libraries won’t parse it for you, because your format isn’t really CSV. Spreadsheets won’t work well with your format because it’s not tabular. If you want to use premade tools to handle your data, you need to change your data to use a preexisting format. This will lead to easier processing in the long run, and changing your data into the right format isn’t that difficult.