I want to be able to create a Pandas DataFrame with MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:
col_indx = MultiIndex.from_tuples([('A', 'B', 'C'), ('A', 'B', 'C2'), ('A', 'B', 'C3'),
('A', 'B2', 'C'), ('A', 'B2', 'C2'), ('A', 'B2', 'C3'),
('A', 'B3', 'C'), ('A', 'B3', 'C2'), ('A', 'B3', 'C3'),
('A2', 'B', 'C'), ('A2', 'B', 'C2'), ('A2', 'B', 'C3'),
('A2', 'B2', 'C'), ('A2', 'B2', 'C2'), ('A2', 'B2', 'C3'),
('A2', 'B3', 'C'), ('A2', 'B3', 'C2'), ('A2', 'B3', 'C3')],
names=['one','two','three'])
row_indx = MultiIndex.from_tuples([(0, 'North', 'M'),
(1, 'East', 'F'),
(2, 'West', 'M'),
(3, 'South', 'M'),
(4, 'South', 'F'),
(5, 'West', 'F'),
(6, 'North', 'M'),
(7, 'North', 'M'),
(8, 'East', 'F'),
(9, 'South', 'M')],
names=['n', 'location', 'sex'])
size=len(row_indx), len(col_indx)
data = np.random.randint(0,10, size)
df = DataFrame(data, index=row_indx, columns=col_indx)
print df
I’ve tried df.to_csv() and read_csv() but they don’t keep the index.
I was thinking of maybe creating a new format using extra delimeters. For example, using a row of ---------------- to mark the end of the column indexes and a | to mark the end of a row index. So it would look like this:
one | A A A A A A A A A A2 A2 A2 A2 A2 A2 A2 A2 A2
two | B B B B2 B2 B2 B3 B3 B3 B B B B2 B2 B2 B3 B3 B3
three | C C2 C3 C C2 C3 C C2 C3 C C2 C3 C C2 C3 C C2 C3
--------------------------------------------------------------------------------------
n location sex :
0 North M | 2 3 9 1 0 6 5 9 5 9 4 4 0 9 6 2 6 1
1 East F | 6 2 9 2 7 0 0 3 7 4 8 1 3 2 1 7 7 5
2 West M | 5 8 9 7 6 0 3 0 2 5 0 3 9 6 7 3 4 9
3 South M | 6 2 3 6 4 0 4 0 1 9 3 6 2 1 0 6 9 3
4 South F | 9 6 0 0 6 1 7 0 8 1 7 6 2 0 8 1 5 3
5 West F | 7 9 7 8 2 0 4 3 8 9 0 3 4 9 2 5 1 7
6 North M | 3 3 5 7 9 4 2 6 3 2 7 5 5 5 6 4 2 9
7 North M | 7 4 8 6 8 4 5 7 9 0 2 9 1 9 7 9 5 6
8 East F | 1 6 5 3 6 4 6 9 6 9 2 4 2 9 8 4 2 4
9 South M | 9 6 6 1 3 1 3 5 7 4 8 6 7 7 8 9 2 3
Does Pandas have a way to write/read DataFrames to/from ASCII files with MultiIndexes?
Not sure which version of pandas you are using but with
0.7.3you can export yourDataFrameto a TSV file and retain the indices by doing this:The reason you need to export to TSV versus CSV is since the column headers have
,characters in them. This should solve the first part of your question.The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:
MultiIndexMultiIndexTo illustrate this, lets read back the TSV file we saved above into a new
DataFrame:So we managed to read
mydf.tsvinto aDataFramethat has the same row index as the originaldf. But:And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a
MultiIndex. As I mentioned above, if you know beorehand that your TSV file header represents aMultiIndexthen you can do the following to fix this: