Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6724377
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T09:40:58+00:00 2026-05-26T09:40:58+00:00

Python noob… please be gentle. In my current program, I have a list of

  • 0

Python noob… please be gentle. In my current program, I have a list of 3 files which may or may not reside in my current directory. If they do reside in my directory, I want to be able to assign them values to be later used in other functions. If the file does not reside in the directory, it should not be assigned values as the file does not exist anyway. The code I have so far is below:

import os, csv

def chkifexists():
    files = ['A.csv', 'B.csv', 'C.csv']
    for fname in files:
        if os.path.isfile(fname):
            if fname == "A.csv":
                hashcolumn = 7
                filepathNum = 5
            elif fname == "B.csv":
                hashcolumn = 15
                filepathNum = 5
            elif fname == "C.csv":
                hashcolumn = 1
                filepathNum = 0
        return fname, hashcolumn, filepathNum


def removedupes(infile, outfile, hashcolumn):
    fname, hashcolumn, filepathNum = chkifexists()
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count to determine
    the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    fname, hashcolumn, filepathNum = chkifexists()
    removedupes(fname, os.path.splitext(fname)[0] + "2.csv", hashcolumn)
    bakcount (fname, os.path.splitext(fname)[0] + "2.csv")


CleanAndPrettify()

The problem I am running into is that the code runs through the list and stops at the first valid file it finds.

I’m not sure if I’m completely thinking of it in the wrong way but I thought I was doing it right.

Current output of this program with A.csv, B.csv, and C.csv present in the same directory:

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!

The Desired output should be:

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
5 duplicate rows removed from B.csv!
8 duplicate rows removed from C.csv!

…and then continue on with the next portion of creating the .bak files.
The output of this program without any CSV files in the same directory:

UnboundLocalError: local variable 'hashcolumn' referenced before assignment
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T09:40:59+00:00Added an answer on May 26, 2026 at 9:40 am

    You have a couple problems in your code.

    First, chkifexists is returning as soon as it finds an existing file, so it never checks any remaining names; also, if no files are found then the hashcolumn and filepathNum are never set — giving you the UnboundLocalError.

    Second, you are calling chkifexists in two places — from removedupes and from CleanAndPrettify. So removedupes will run for every existing file for every existing file — not what you want! In fact, since CleanAndPrettify has just verified the file exists removedupes should just go with whatever is handed to it.

    There are at least three ways to handle the case where no files are found: have chkifexists raise an exception; have a flag in CleanAndPrettify that tracks if files were found; or turn the results of chkifexists into a list which you can then check for emptiness.

    In the modified code I moved the files into a dictionary with the name as the key and the value as a tuple of hashcolumn and filepathNum. chkifexists now accepts the filenames to look for as a dictionary, and yields the values when a file is found; if no files were found, a NoFilesFound exception will be raised.

    Here’s the code:

    import os, csv
    
    # store file attributes for easy modifications
    # format is 'filename': (hashcolumn, filepathNum)
    files = {
            'A.csv': (7, 5),
            'B.csv': (15, 5),
            'C.csv': (1, 0),
            }
    
    class NoFilesFound(Exception):
        "No .csv files were found to clean up"
    
    def chkifexists(somefiles):
        # load all three at once, but only yield them if filename
        # is found
        filesfound = False
        for fname, (hashcolumn, filepathNum) in somefiles.items():
            if os.path.isfile(fname):
                filesfound = True
                yield fname, hashcolumn, filepathNum
        if not filesfound:
            raise NoFilesFound
    
    def removedupes(infile, outfile, hashcolumn, filepathNum):
        # this is now a single-run function
        r1 = file(infile, 'rb')
        r2 = csv.reader(r1)
        w1 = file(outfile, 'wb')
        w2 = csv.writer(w1)
        hashes = set()
        for row in r2:
            if row[hashcolumn] =="": 
                w2.writerow(row)       
                hashes.add(row[hashcolumn])  
            if row[hashcolumn] not in hashes:
                w2.writerow(row)
                hashes.add(row[hashcolumn])
        w1.close()
        r1.close()
    
    
    def bakcount(origfile1, origfile2):
        '''This function creates a .bak file of the original and does a row count
        to determine the number of rows removed'''
        os.rename(origfile1, origfile1+".bak")
        count1 = len(open(origfile1+".bak").readlines())
        #print count1
    
        os.rename(origfile2, origfile1)
        count2 = len(open(origfile1).readlines())
        #print count2
    
        print str(count1 - count2) + " duplicate rows removed from " \
            + str(origfile1) +"!"
    
    
    def CleanAndPrettify():
        print "Removing duplicate rows from input files..."
        try:
            for fname, hashcolumn, filepathNum in chkifexists(files):
                removedupes(
                       fname,
                       os.path.splitext(fname)[0] + "2.csv",
                       hashcolumn,
                       filepathNum,
                       )
                bakcount (fname, os.path.splitext(fname)[0] + "2.csv")
        except NoFilesFound:
            print "no files to clean up"
    
    CleanAndPrettify()
    

    Unable to test as I don’t have the A, B, and C .csv files, but hopefully this will get you pointed in the right direction. As you can see, the raise NoFilesFound option uses the flag method to keep track of files not being found; here is the list method:

    def chkifexists(somefiles):
        # load all three at once, but only yield them if filename
        # is found
        for fname, (hashcolumn, filepathNum) in somefiles.items():
            if os.path.isfile(fname):
                filesfound = True
                yield fname, hashcolumn, filepathNum
    
    def CleanAndPrettify():
        print "Removing duplicate rows from input files..."
        found_files = list(chkifexists(files))
        if not found_files:
            print "no files to clean up"
        else:
            for fname, hashcolumn, filepathNum in found_files:
                removedupes(...)
                bakcount(...)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am a noob to Python and have not had any luck figuring this
I'm a total python noob so please bear with me. I want to have
I am a noob when it comes to python. I have a python script
Python noob; please explain why this loop doesn't exit. for i in range(0,10): print
Python noob here, Currently I'm working with SQLAlchemy, and I have this: from __init__
Noob @ programming with python and pygtk. I'm creating an application which includes a
this may sound like a strange question from a Python noob, but here's the
I have two files which I loaded into lists. The content of the first
I am a Python noob. I create a class as follows: class t1: x
Python's access to environment variables does not accurately reflect the operating system's view of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.