Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8131055
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T08:57:37+00:00 2026-06-06T08:57:37+00:00

I am looking to compare multiple CSV files with Python, and output a report.

  • 0

I am looking to compare multiple CSV files with Python, and output a report. The number of CSV files to compare will vary, so I am having it pull a list from a directory. Each CSV has 2 columns: the first being an area code and exchange, the second being a price.
e.g.

1201007,0.006
1201032,0.0119
1201040,0.0106
1201200,0.0052
1201201,0.0345

The files will not all contain the same area codes and exchanges, so rather than a line by line comparison, I need to use the first field as the key. I then need to generate a report that says: file1 had 200 mismatches to file2, 371 lower prices than file2, and 562 higher prices than file2. I need to generate this to compare each file to each other, so this step would be repeated against file3, file4…., and then file2 against files3, etc. I would consider myself a relative noob to Python. Below is the code I have so far which just grabs the files in the directory and prints prices from all files with a total tally.

import csv
import os

count = 0
#dir containing CSV files
csvdir="tariff_compare"
dirList=os.listdir(csvdir)
#index all files for later use
for idx, fname in enumerate(dirList):
    print fname
    dic_read = csv.reader(open(fname))
    for row in dic_read:
        key = row[0]
        price = row[1]
        print price
        count += 1
print count
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T08:57:40+00:00Added an answer on June 6, 2026 at 8:57 am

    This assumes that all your data can fit in memory; if not, you will have to try loading only some sets of files at a time, or even just two files at a time.

    It does the comparison and writes the output to a summary.csv file, one row per pair of files.

    import csv
    import glob
    import os
    import itertools
    
    def get_data(fname):
        """
        Load a .csv file
        Returns a dict of {'exchange':float(price)}
        """
        with open(fname, 'rb') as inf:
            items = (row.split() for row in csv.reader(inf))
            return {item[0]:float(item[1]) for item in items}
    
    def do_compare(a_name, a_data, b_name, b_data):
        """
        Compare two data files of {'key': float(value)}
    
        Returns a list of
          - the name of the first file
          - the name of the second file
          - the number of keys in A which are not in B
          - the number of keys in B which are not in A
          - the number of values in A less than the corresponding value in B
          - the number of values in A equal to the corresponding value in B
          - the number of values in A greater than the corresponding value in B
        """
        a_keys = set(a_data.iterkeys())
        b_keys = set(b_data.iterkeys())
    
        unique_to_a = len(a_keys - b_keys)
        unique_to_b = len(b_keys - a_keys)
    
        lt,eq,gt = 0,0,0
        pairs = ((a_data[key], b_data[key]) for key in a_keys & b_keys)
        for ai,bi in pairs:
            if ai < bi:
                lt +=1 
            elif ai == bi:
                eq += 1
            else:
                gt += 1
    
        return [a_name, b_name, unique_to_a, unique_to_b, lt, eq, gt]
    
    def main():
        os.chdir('d:/tariff_compare')
    
        # load data from csv files
        data = {}
        for fname in glob.glob("*.csv"):
            data[fname] = get_data(fname)
    
        # do comparison
        files = data.keys()
        files.sort()
        with open('summary.csv', 'wb') as outf:
            outcsv = csv.writer(outf)
            outcsv.writerow(["File A", "File B", "Unique to A", "Unique to B", "A<B", "A==B", "A>B"])
            for a,b in itertools.combinations(files, 2):
                outcsv.writerow(do_compare(a, data[a], b, data[b]))
    
    if __name__=="__main__":
        main()
    

    Edit: user1277476 makes a good point; if you pre-sort your files by exchange (or if they are already in sorted order), you could iterate simultaneously through all your files, keeping nothing but the current line for each in memory.

    This would let you do a more in-depth comparison for each exchange entry – number of files containing a value, or top or bottom N values, etc.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm looking to compare two big sets of csv files and/or a csv file
I'm looking for an easy python way to compare column types from SQLAlchemy to
I'm looking for a way to kick off a diff on multiple files very
I am looking for a C++ or Python library to compare two JPEG or
I'm looking for a Linux command line tool to compare two PDF files and
I'm having some trouble with Haskell. I'm looking for a function that can compare
I'm looking to compare two documents to determine what percentage of their text matches
I'm looking to compare two varchars in SQL, one would be something like Cafe
I am looking for compare utility similar for win merge or beyond compare .
I am looking for a gem that can compare two strings (in this case

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.