Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9223697
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T04:08:01+00:00 2026-06-18T04:08:01+00:00

I have a list of items that likely has some export issues. I would

  • 0

I have a list of items that likely has some export issues. I would like to get a list of the duplicate items so I can manually compare them. When I try to use pandas duplicated method, it only returns the first duplicate. Is there a a way to get all of the duplicates and not just the first one?

A small subsection of my dataset looks like this:

ID,ENROLLMENT_DATE,TRAINER_MANAGING,TRAINER_OPERATOR,FIRST_VISIT_DATE
1536D,12-Feb-12,"06DA1B3-Lebanon NH",,15-Feb-12
F15D,18-May-12,"06405B2-Lebanon NH",,25-Jul-12
8096,8-Aug-12,"0643D38-Hanover NH","0643D38-Hanover NH",25-Jun-12
A036,1-Apr-12,"06CB8CF-Hanover NH","06CB8CF-Hanover NH",9-Aug-12
8944,19-Feb-12,"06D26AD-Hanover NH",,4-Feb-12
1004E,8-Jun-12,"06388B2-Lebanon NH",,24-Dec-11
11795,3-Jul-12,"0649597-White River VT","0649597-White River VT",30-Mar-12
30D7,11-Nov-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",30-Nov-11
3AE2,21-Feb-12,"06405B2-Lebanon NH",,26-Oct-12
B0FE,17-Feb-12,"06D1B9D-Hartland VT",,16-Feb-12
127A1,11-Dec-11,"064456E-Hanover NH","064456E-Hanover NH",11-Nov-12
161FF,20-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",3-Jul-12
A036,30-Nov-11,"063B208-Randolph VT","063B208-Randolph VT",
475B,25-Sep-12,"06D26AD-Hanover NH",,5-Nov-12
151A3,7-Mar-12,"06388B2-Lebanon NH",,16-Nov-12
CA62,3-Jan-12,,,
D31B,18-Dec-11,"06405B2-Lebanon NH",,9-Jan-12
20F5,8-Jul-12,"0669C50-Randolph VT",,3-Feb-12
8096,19-Dec-11,"0649597-White River VT","0649597-White River VT",9-Apr-12
14E48,1-Aug-12,"06D3206-Hanover NH",,
177F8,20-Aug-12,"063B208-Randolph VT","063B208-Randolph VT",5-May-12
553E,11-Oct-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",8-Mar-12
12D5F,18-Jul-12,"0649597-White River VT","0649597-White River VT",2-Nov-12
C6DC,13-Apr-12,"06388B2-Lebanon NH",,
11795,27-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",19-Jun-12
17B43,11-Aug-12,,,22-Oct-12
A036,11-Aug-12,"06D3206-Hanover NH",,19-Jun-12

My code looks like this currently:

df_bigdata_duplicates = df_bigdata[df_bigdata.duplicated(cols='ID')]

There area a couple duplicate items. But, when I use the above code, I only get the first item. In the API reference, I see how I can get the last item, but I would like to have all of them so I can visually inspect them to see why I am getting the discrepancy. So, in this example I would like to get all three A036 entries and both 11795 entries and any other duplicated entries, instead of the just first one. Any help is most appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T04:08:02+00:00Added an answer on June 18, 2026 at 4:08 am

    Method #1: print all rows where the ID is one of the IDs in duplicated:

    >>> import pandas as pd
    >>> df = pd.read_csv("dup.csv")
    >>> ids = df["ID"]
    >>> df[ids.isin(ids[ids.duplicated()])].sort_values("ID")
           ID ENROLLMENT_DATE        TRAINER_MANAGING        TRAINER_OPERATOR FIRST_VISIT_DATE
    24  11795       27-Feb-12      0643D38-Hanover NH      0643D38-Hanover NH        19-Jun-12
    6   11795        3-Jul-12  0649597-White River VT  0649597-White River VT        30-Mar-12
    18   8096       19-Dec-11  0649597-White River VT  0649597-White River VT         9-Apr-12
    2    8096        8-Aug-12      0643D38-Hanover NH      0643D38-Hanover NH        25-Jun-12
    12   A036       30-Nov-11     063B208-Randolph VT     063B208-Randolph VT              NaN
    3    A036        1-Apr-12      06CB8CF-Hanover NH      06CB8CF-Hanover NH         9-Aug-12
    26   A036       11-Aug-12      06D3206-Hanover NH                     NaN        19-Jun-12
    

    but I couldn’t think of a nice way to prevent repeating ids so many times. I prefer method #2: groupby on the ID.

    >>> pd.concat(g for _, g in df.groupby("ID") if len(g) > 1)
           ID ENROLLMENT_DATE        TRAINER_MANAGING        TRAINER_OPERATOR FIRST_VISIT_DATE
    6   11795        3-Jul-12  0649597-White River VT  0649597-White River VT        30-Mar-12
    24  11795       27-Feb-12      0643D38-Hanover NH      0643D38-Hanover NH        19-Jun-12
    2    8096        8-Aug-12      0643D38-Hanover NH      0643D38-Hanover NH        25-Jun-12
    18   8096       19-Dec-11  0649597-White River VT  0649597-White River VT         9-Apr-12
    3    A036        1-Apr-12      06CB8CF-Hanover NH      06CB8CF-Hanover NH         9-Aug-12
    12   A036       30-Nov-11     063B208-Randolph VT     063B208-Randolph VT              NaN
    26   A036       11-Aug-12      06D3206-Hanover NH                     NaN        19-Jun-12
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a list of items that a user can add to by clicking
I am creating an application that will have a list of items that can
I have a list of items on my web page that the user can
I have a list of items that I get from the database and display
How can I query the DB to return a list of items that have
I have a list of items that I need to validate. The list can
I have a list of items that I need to update on different intervals.
I have a list of items that are displayed using a ListView from a
I have a list of items that are created from the results of website
I have a dynamic list of items that will be used to POST information

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.