I’ve currently switched my focus from R to Python. I work with data.table in R a lot, and I find it sometimes quite difficult to find an equivalent for some functions in Python.
I have a pandas data frame that looks like this:
df = pd.DataFrame({‘A’:[‘abc’,’def’, ‘def’, ‘abc’, ‘def’, ‘def’,’abc’],’B’:[13123,45,1231,463,142131,4839, 4341]})
A B 0 abc 13123 1 def 45 2 def 1231 3 abc 463 4 def 142131 5 def 4839 6 abc 4341
I need to create a column that increments from 1 based on A and B, so that it indicates the increasing order of B. So I first create the sorted data frame, and the column I’m interested in creating is C as below:
A B C 1 abc 463 1 6 abc 4341 2 0 abc 13123 3 3 def 45 1 2 def 1231 2 5 def 4839 3 4 def 142131 4
In R, using the library(data.table), this can be easily done in one line and creates a column within the original data table:
df[, C := 1:.N, by=A]
I’ve looked around and I think I might be able to make use of something like this:
df.groupby(‘A’).size()
or
df[‘B’].argsort()
but not sure how to proceed from here, and how to join the new column back to the original data frame. It would be very helpful if anyone could give me any pointer.
Many thanks!
1 Answer