Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 941185
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T22:04:48+00:00 2026-05-15T22:04:48+00:00

What would anyone consider the most efficient way to merge two datasets using Python?

  • 0

What would anyone consider the most efficient way to merge two datasets using Python?

A little background – this code will take 100K+ records in the following format:

{user: aUser, transaction: UsersTransactionNumber}, ...

and using the following data

{transaction: aTransactionNumber, activationNumber: assoiciatedActivationNumber}, ...

to create

{user: aUser, activationNumber: assoiciatedActivationNumber}, ...

N.B These are not Python dictionaries, just the closest thing to portraying record format cleanly.

So in theory, all I am trying to do is create a view of two lists (or tables) joining on a common key – at first this points me towards sets (unions etc), but before I start learning these in depth, are they the way to go? So far I felt this could be implemented as:

  1. Create a list of dictionaries and iterate over the list comparing the key each time, however, worst case scenario this could run up to len(inputDict)*len(outputDict) <- Not sure?

  2. Manipulate the data as an in-memory SQLite Table? Peferrably not as although there is no strict requirement for Python 2.4, it would make life easier.

  3. Some kind of Set based magic?

Clarification

The whole purpose of this script is to summarise, the actual data sets are coming from two different sources. The user and transaction numbers are coming in the form of a CSV as an output from a performance test that is testing email activation code throughput. The second dataset comes from parsing the test mailboxes, which contain the transaction id and activation code. The output of this test is then a CSV that will get pumped back into stage 2 of the performance test, activating user accounts using the activation codes that were paired up.

Apologies if my notation for the records was misleading, I have updated them accordingly.

Thanks for the replies, I am going to give two ideas a try:

  • Sorting the lists first (I don’t know
    how expensive this is)
  • Creating a
    dictionary with the transactionCodes
    as the key then store the user and
    activation code in a list as the
    value

Performance isn’t overly paramount for me, I just want to try and get into good habits with my Python Programming.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T22:04:49+00:00Added an answer on May 15, 2026 at 10:04 pm

    Here’s a radical approach.

    Don’t.

    You have two CSV files; one (users) is clearly the driver. Leave this alone.
    The other — transaction codes for a user — can be turned into a simple dictionary.

    Don’t “combine” or “join” anything except when absolutely necessary. Certainly don’t “merge” or “pre-join”.

    Write your application do simply do simple lookups in the other collection.

    Create a list of dictionaries and iterate over the list comparing the key each time,

    Close. It looks like this. Note: No Sort.

    import csv
    with open('activations.csv','rb') as act_data:
        rdr= csv.DictReader( act_data)
        activations = dict( (row['user'],row) for row in rdr )
    with open('users.csv','rb') as user_data:
        rdr= csv.DictReader( user_data )
        with open( 'users_2.csv','wb') as updated_data:
            wtr= csv.DictWriter( updated_data, ['some','list','of','columns'])
            for user in rdr:
                 user['some_field']= activations[user['user_id_column']]['some_field']
                 wtr.writerow( user )
    

    This is fast and simple. Save the dictionaries (use shelve or pickle).

    however, worst case scenario this could run up to len(inputDict)*len(outputDict) <- Not sure?

    False.

    One list is the “driving” list. The other is the lookup list. You’ll drive by iterating through users and lookup appropriate values for transaction. This is O( n ) on the list of users. The lookup is O( 1 ) because dictionaries are hashes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 462k
  • Answers 462k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer You cannot do it with the backticks, as they return… May 16, 2026 at 12:21 am
  • Editorial Team
    Editorial Team added an answer At the point you are trying to modify the access… May 16, 2026 at 12:21 am
  • Editorial Team
    Editorial Team added an answer It's supposed to be like that. Every request is independent… May 16, 2026 at 12:21 am

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.