I’ve got this dataframe: Name Country Gender Age 1 John GB M 25 2

Question

0

Asked: June 15, 20262026-06-15T13:21:05+00:00 2026-06-15T13:21:05+00:00

I’ve got this dataframe: Name Country Gender Age 1 John GB M 25 2

0

I’ve got this dataframe:

    Name    Country Gender  Age
1   John      GB      M     25
2   Mark      US      M     35
3   Jane      0       0      0
4   Jane      US      F     30
5   Jane      US      F      0
6   Kate      GB      F     18

As you can see the value “Jane” appears 3 times. What I want to do is to deduplicate the list based on the variable “Name” but because the rest of the columns are important to me, I want to keep the rows that have the most information in them. For example if I was to deduplicate the above file in excel, it would keep the first value of “Jane” and delete all the other ones. But the first value of “Jane” (row no3) has got missing information in the other columns.

So in other words I want to deduplicate the list by “Name” but add a criteria to keep the rows that have any other value different from “0” in the column “Age”. This way the result I would get would be this:

    Name    Country Gender  Age
1   John       GB     M     25
2   Mark       US     M     35
3   Jane       US     F     30
4   Kate       GB     F     18

I have tried this

file3 <- file1[!duplicated(file1$Name),]

But like excel it keeps the value of “Jane” that has no usable information in the other columns.

How do I sort the rows based on column “Age” in a Z-A order so that anything that has “0” will be on the bottom and will be removed when I deduplicate the list?

Cheers

David

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T13:21:07+00:00

Try this trick

ind <- with(DF, 
        Country !=0 &
        Gender %in% c('F', 'M') &
        Age !=0)

DF[ind, ]
  Name Country Gender Age
1 John      GB      M  25
2 Mark      US      M  35
4 Jane      US      F  30
6 Kate      GB      F  18

So far it works well and produces your desired output

EDIT

 library(doBy)
    orderBy(~ -Age+Name, DF) # Sort decreasingly by Age and Name

  Name Country Gender Age
2 Mark      US      M  35
4 Jane      US      F  30
1 John      GB      M  25
6 Kate      GB      F  18
3 Jane       0      0   0
5 Jane      US      F   0

Or simply using Base functions:

DF[order(DF$Age, DF$Name, decreasing = TRUE), ]
  Name Country Gender Age
2 Mark      US      M  35
4 Jane      US      F  30
1 John      GB      M  25
6 Kate      GB      F  18
3 Jane       0      0   0
5 Jane      US      F   0

Now you can select by indexing the correct rows meeting your conditions, I really think the first part is better than these two lasts.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got this dataframe: Name Country Gender Age 1 John GB M 25 2

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply