Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4034754
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T12:00:44+00:00 2026-05-20T12:00:44+00:00

My question looks like a classic one, but I cannot find the exact same

  • 0

My question looks like a classic one, but I cannot find the exact same question in stackoverflow. I hope mine is not a duplicate question.

I have a large file. The file has many rows and fixed columns. I am interested in columns A and B among all columns. The goal is that I would like to get rows, where (1) the value in Column A in the row appears in other rows as well, and (2) there is more than one row that has the same value of Column A but a different value of Column B.

Consider the following table. I am interested in rows 1,3, and 5 because “a” appears in 3 rows, and the values in Column B are different. In contrast, I am not interested in rows 2 and 4 because “b” appears twice, but its corresponding value in Column B is always “1”. Similarly, I am not interested in row 6 because “c” appears only once.

# A B C D
=========
1 a 0 x x
2 b 1 x x
3 a 2 x x
4 b 1 x x
5 a 3 x x
6 c 1 x x

To find such columns, I read all lines in the file, convert each line with an object, create list for the objects, and find interesting columns with the following algorithm. The algorithm works, but takes time for my dataset. Do you have any suggestions to make the algorithm efficient?

def getDuplicateList(oldlist):
    # find duplicate elements
    duplicate = set()
    a_to_b = {}
    for elements in oldlist:
        a = elements.getA()
        b = elements.getB()
        if a in a_to_b:
            if b != a_to_b[a]:
                duplicate.add(a)
        a_to_b[a] = b 

    # get duplicate list
    newlist = []
    for elements in oldlist:
        a = elements.getA()
        if a in duplicate:
            newlist.append(a)

    return newlist

p.s. I add some constraints to clarify.

  1. I am using Python 2.7
  2. I need “all interesting rows”: duplicate has “some” interesting “a”s.
  3. Order is important
  4. In fact, the data is memory accesses of a program execution. Column A has memory accesses, and Column B has some conditions that I am interested in. If a memory access has several conditions in runtime, then I would like to investigate the sequence of the memory access.
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T12:00:44+00:00Added an answer on May 20, 2026 at 12:00 pm

    Well, the two iterations over the elements in oldlist can be replaced by a single iteration. I believe this would, in most cases, improve the efficiency of your algorithm, particularly for long lists.

    If the order of newlist is of no concern to you, I propose a single-loop replacement that has the same result as your algorithm. I have tested it against randomly generated million-element lists and it always runs in approximately half the time:

    def new_getDuplicateList(oldlist):
        # find duplicate elements
        newlist = []
        duplicate = set()
        a_to_b = {}
        for elements in oldlist:
            a = elements[0]
            b = elements[1]
            if a in duplicate:
                newlist.append(a)
            else:
                if a in a_to_b.keys():
                    if not b in a_to_b[a]:
                        a_to_b[a].append(b)
                        duplicate.add(a)
                        extension = [a for i in a_to_b[a]]
                        newlist.extend(extension)
                    else:
                        a_to_b[a].append(b)
                else:
                    a_to_b[a] = [b]
    
        return newlist
    

    (Probably the conditionals could be made prettier.) It would be very easy to modify it to output the entire rows instead of just the a values, just replacing a by (a, b) wherever necessary. Also note that it consumes a bit more of memory than the first algorithm, due to the a_to_b dicts, which now hold lists.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

My question looks like a dozen ones about releasing properties, but I can't find
Edit: This question looks like it might be the same problem, but has no
It looks like this question has been asked dozens of times, but none of
Looks like this question has been asked thousand times already, but each person's configuration
Looks like a dumb question, but I tried the following (where Me is a
Apologies, It looks like my original question was not able to correctly explain what
I know this question looks like a dupe: I checked and it's not In
Sorry for constantly re-editing my question but looks like this is the only way
My question might looks like silly, but i struck with it. I have a
Okay guess this question looks a lot like: What is the best way to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.