I have input data in a flattened file. I want to normalize this data,

Question

0

Asked: June 10, 20262026-06-10T09:34:01+00:00 2026-06-10T09:34:01+00:00

I have input data in a flattened file. I want to normalize this data,

0

I have input data in a flattened file. I want to normalize this data, by splitting it into tables. Can I do that neatly with pandas – that is, by reading the flattened data into a DataFrame instance, and then applying some functions to obtain the resulting DataFrame instances?

Example:

Data is given to me on disk in the form of a CSV file like this:

ItemId   ClientId   PriceQuoted  ItemDescription
1        1          10           scroll of Sneak
1        2          12           scroll of Sneak
1        3          13           scroll of Sneak
2        2          2500         scroll of Invisible
2        4          2200         scroll of Invisible

I want to create two DataFrames:

ItemId   ItemDescription
1        scroll of Sneak
2        scroll of Invisibile

and

ItemId   ClientId   PriceQuoted
1        1          10
1        2          12
1        3          13
2        2          2500
2        4          2200

If pandas only has a good solution for the simplest case (normalization results in 2 tables with many-to-one relationship – just like in the above example), it might be enough for my current needs. I may need a more general solution in the future, however.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T09:34:02+00:00

In [30]: df = pandas.read_csv('foo1.csv', sep='[\s]{2,}')

In [30]: df
Out[30]:
   ItemId  ClientId  PriceQuoted      ItemDescription
0       1         1           10      scroll of Sneak
1       1         2           12      scroll of Sneak
2       1         3           13      scroll of Sneak
3       2         2         2500  scroll of Invisible
4       2         4         2200  scroll of Invisible

In [31]: df1 = df[['ItemId', 'ItemDescription']].drop_duplicates().set_index('ItemId')

In [32]: df1
Out[32]:
            ItemDescription
ItemId
1           scroll of Sneak
2       scroll of Invisible

In [33]: df2 = df[['ItemId', 'ClientId', 'PriceQuoted']]

In [34]: df2
Out[34]:
   ItemId  ClientId  PriceQuoted
0       1         1           10
1       1         2           12
2       1         3           13
3       2         2         2500
4       2         4         2200

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have input data in a flattened file. I want to normalize this data,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply