Python noob… please be gentle. In my current program, I have a list of

Question

0

Asked: May 26, 20262026-05-26T09:40:58+00:00 2026-05-26T09:40:58+00:00

Python noob… please be gentle. In my current program, I have a list of

0

Python noob… please be gentle. In my current program, I have a list of 3 files which may or may not reside in my current directory. If they do reside in my directory, I want to be able to assign them values to be later used in other functions. If the file does not reside in the directory, it should not be assigned values as the file does not exist anyway. The code I have so far is below:

import os, csv

def chkifexists():
    files = ['A.csv', 'B.csv', 'C.csv']
    for fname in files:
        if os.path.isfile(fname):
            if fname == "A.csv":
                hashcolumn = 7
                filepathNum = 5
            elif fname == "B.csv":
                hashcolumn = 15
                filepathNum = 5
            elif fname == "C.csv":
                hashcolumn = 1
                filepathNum = 0
        return fname, hashcolumn, filepathNum


def removedupes(infile, outfile, hashcolumn):
    fname, hashcolumn, filepathNum = chkifexists()
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count to determine
    the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    fname, hashcolumn, filepathNum = chkifexists()
    removedupes(fname, os.path.splitext(fname)[0] + "2.csv", hashcolumn)
    bakcount (fname, os.path.splitext(fname)[0] + "2.csv")


CleanAndPrettify()

The problem I am running into is that the code runs through the list and stops at the first valid file it finds.

I’m not sure if I’m completely thinking of it in the wrong way but I thought I was doing it right.

Current output of this program with A.csv, B.csv, and C.csv present in the same directory:

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!

The Desired output should be:

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
5 duplicate rows removed from B.csv!
8 duplicate rows removed from C.csv!

…and then continue on with the next portion of creating the .bak files.
The output of this program without any CSV files in the same directory:

UnboundLocalError: local variable 'hashcolumn' referenced before assignment

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T09:40:59+00:00

You have a couple problems in your code.

First, chkifexists is returning as soon as it finds an existing file, so it never checks any remaining names; also, if no files are found then the hashcolumn and filepathNum are never set — giving you the UnboundLocalError.

Second, you are calling chkifexists in two places — from removedupes and from CleanAndPrettify. So removedupes will run for every existing file for every existing file — not what you want! In fact, since CleanAndPrettify has just verified the file exists removedupes should just go with whatever is handed to it.

There are at least three ways to handle the case where no files are found: have chkifexists raise an exception; have a flag in CleanAndPrettify that tracks if files were found; or turn the results of chkifexists into a list which you can then check for emptiness.

In the modified code I moved the files into a dictionary with the name as the key and the value as a tuple of hashcolumn and filepathNum. chkifexists now accepts the filenames to look for as a dictionary, and yields the values when a file is found; if no files were found, a NoFilesFound exception will be raised.

Here’s the code:

import os, csv

# store file attributes for easy modifications
# format is 'filename': (hashcolumn, filepathNum)
files = {
        'A.csv': (7, 5),
        'B.csv': (15, 5),
        'C.csv': (1, 0),
        }

class NoFilesFound(Exception):
    "No .csv files were found to clean up"

def chkifexists(somefiles):
    # load all three at once, but only yield them if filename
    # is found
    filesfound = False
    for fname, (hashcolumn, filepathNum) in somefiles.items():
        if os.path.isfile(fname):
            filesfound = True
            yield fname, hashcolumn, filepathNum
    if not filesfound:
        raise NoFilesFound

def removedupes(infile, outfile, hashcolumn, filepathNum):
    # this is now a single-run function
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count
    to determine the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " \
        + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    try:
        for fname, hashcolumn, filepathNum in chkifexists(files):
            removedupes(
                   fname,
                   os.path.splitext(fname)[0] + "2.csv",
                   hashcolumn,
                   filepathNum,
                   )
            bakcount (fname, os.path.splitext(fname)[0] + "2.csv")
    except NoFilesFound:
        print "no files to clean up"

CleanAndPrettify()

Unable to test as I don’t have the A, B, and C .csv files, but hopefully this will get you pointed in the right direction. As you can see, the raise NoFilesFound option uses the flag method to keep track of files not being found; here is the list method:

def chkifexists(somefiles):
    # load all three at once, but only yield them if filename
    # is found
    for fname, (hashcolumn, filepathNum) in somefiles.items():
        if os.path.isfile(fname):
            filesfound = True
            yield fname, hashcolumn, filepathNum

def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    found_files = list(chkifexists(files))
    if not found_files:
        print "no files to clean up"
    else:
        for fname, hashcolumn, filepathNum in found_files:
            removedupes(...)
            bakcount(...)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Python noob… please be gentle. In my current program, I have a list of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply