I have input data in a flattened file. I want to normalize this data, by splitting it into tables. Can I do that neatly with pandas – that is, by reading the flattened data into a DataFrame instance, and then applying some functions to obtain the resulting DataFrame instances?
Example:
Data is given to me on disk in the form of a CSV file like this:
ItemId ClientId PriceQuoted ItemDescription
1 1 10 scroll of Sneak
1 2 12 scroll of Sneak
1 3 13 scroll of Sneak
2 2 2500 scroll of Invisible
2 4 2200 scroll of Invisible
I want to create two DataFrames:
ItemId ItemDescription
1 scroll of Sneak
2 scroll of Invisibile
and
ItemId ClientId PriceQuoted
1 1 10
1 2 12
1 3 13
2 2 2500
2 4 2200
If pandas only has a good solution for the simplest case (normalization results in 2 tables with many-to-one relationship – just like in the above example), it might be enough for my current needs. I may need a more general solution in the future, however.
1 Answer