I am looking to compare multiple CSV files with Python, and output a report.

Question

0

Asked: June 6, 20262026-06-06T08:57:37+00:00 2026-06-06T08:57:37+00:00

I am looking to compare multiple CSV files with Python, and output a report.

0

I am looking to compare multiple CSV files with Python, and output a report. The number of CSV files to compare will vary, so I am having it pull a list from a directory. Each CSV has 2 columns: the first being an area code and exchange, the second being a price.
e.g.

1201007,0.006
1201032,0.0119
1201040,0.0106
1201200,0.0052
1201201,0.0345

The files will not all contain the same area codes and exchanges, so rather than a line by line comparison, I need to use the first field as the key. I then need to generate a report that says: file1 had 200 mismatches to file2, 371 lower prices than file2, and 562 higher prices than file2. I need to generate this to compare each file to each other, so this step would be repeated against file3, file4…., and then file2 against files3, etc. I would consider myself a relative noob to Python. Below is the code I have so far which just grabs the files in the directory and prints prices from all files with a total tally.

import csv
import os

count = 0
#dir containing CSV files
csvdir="tariff_compare"
dirList=os.listdir(csvdir)
#index all files for later use
for idx, fname in enumerate(dirList):
    print fname
    dic_read = csv.reader(open(fname))
    for row in dic_read:
        key = row[0]
        price = row[1]
        print price
        count += 1
print count

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T08:57:40+00:00

This assumes that all your data can fit in memory; if not, you will have to try loading only some sets of files at a time, or even just two files at a time.

It does the comparison and writes the output to a summary.csv file, one row per pair of files.

import csv
import glob
import os
import itertools

def get_data(fname):
    """
    Load a .csv file
    Returns a dict of {'exchange':float(price)}
    """
    with open(fname, 'rb') as inf:
        items = (row.split() for row in csv.reader(inf))
        return {item[0]:float(item[1]) for item in items}

def do_compare(a_name, a_data, b_name, b_data):
    """
    Compare two data files of {'key': float(value)}

    Returns a list of
      - the name of the first file
      - the name of the second file
      - the number of keys in A which are not in B
      - the number of keys in B which are not in A
      - the number of values in A less than the corresponding value in B
      - the number of values in A equal to the corresponding value in B
      - the number of values in A greater than the corresponding value in B
    """
    a_keys = set(a_data.iterkeys())
    b_keys = set(b_data.iterkeys())

    unique_to_a = len(a_keys - b_keys)
    unique_to_b = len(b_keys - a_keys)

    lt,eq,gt = 0,0,0
    pairs = ((a_data[key], b_data[key]) for key in a_keys & b_keys)
    for ai,bi in pairs:
        if ai < bi:
            lt +=1 
        elif ai == bi:
            eq += 1
        else:
            gt += 1

    return [a_name, b_name, unique_to_a, unique_to_b, lt, eq, gt]

def main():
    os.chdir('d:/tariff_compare')

    # load data from csv files
    data = {}
    for fname in glob.glob("*.csv"):
        data[fname] = get_data(fname)

    # do comparison
    files = data.keys()
    files.sort()
    with open('summary.csv', 'wb') as outf:
        outcsv = csv.writer(outf)
        outcsv.writerow(["File A", "File B", "Unique to A", "Unique to B", "A<B", "A==B", "A>B"])
        for a,b in itertools.combinations(files, 2):
            outcsv.writerow(do_compare(a, data[a], b, data[b]))

if __name__=="__main__":
    main()

Edit: user1277476 makes a good point; if you pre-sort your files by exchange (or if they are already in sorted order), you could iterate simultaneously through all your files, keeping nothing but the current line for each in memory.

This would let you do a more in-depth comparison for each exchange entry – number of files containing a value, or top or bottom N values, etc.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am looking to compare multiple CSV files with Python, and output a report.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply