Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7194689
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T20:25:24+00:00 2026-05-28T20:25:24+00:00

I am filtering huge text files using multiprocessing.py. The code basically opens the text

  • 0

I am filtering huge text files using multiprocessing.py. The code basically opens the text files, works on it, then closes it.

Thing is, I’d like to be able to launch it successively on multiple text files. Hence, I tried to add a loop, but for some reason it doesn’t work (while the code works on each file). I believe this is an issue with:

    if __name__ == '__main__':    

However, I am looking for something else. I tried to create a Launcher and a LauncherCount files like this:

    LauncherCount.py:

    def setLauncherCount(n):
        global LauncherCount
        LauncherCount = n

and,

    Launcher.py:

import os
import LauncherCount

LauncherCount.setLauncherCount(0)

os.system("OrientedFilterNoLoop.py")

LauncherCount.setLauncherCount(1)

os.system("OrientedFilterNoLoop.py")

...

I import LauncherCount.py, and use LauncherCount.LauncherCount as my loop index.

Of course, this doesn’t work too as it edits the variable LauncherCount.LauncherCount locally, so it won’t be edited in the imported version of LauncherCount.

Is there any way to edit globally a variable in an imported file? Or, is there any way to do this in any other way? What I need is running a code multiple times, in changing one value, and without using any loop apparently.

Thanks!

Edit: Here is my main code if necessary. Sorry for the bad style …

import multiprocessing
import config
import time
import LauncherCount

class Filter:

    """ Filtering methods """
    def __init__(self):
        print("launching methods")

        #   Return the list: [Latitude,Longitude]  (elements are floating point numbers)
    def LatLong(self,line):

        comaCount = []
        comaCount.append(line.find(','))
        comaCount.append(line.find(',',comaCount[0] + 1))
    comaCount.append(line.find(',',comaCount[1] + 1))
    Lat = line[comaCount[0] + 1 : comaCount[1]]
    Long = line[comaCount[1] + 1 : comaCount[2]]

    try:
        return [float(Lat) , float(Long)]
    except ValueError:
        return [0,0]

#   Return a boolean:
#   - True if the Lat/Long is within the Lat/Long rectangle defined by:
#           tupleFilter = (minLat,maxLat,minLong,maxLong)
#   - False if not                                                                   
def LatLongFilter(self,LatLongList , tupleFilter) :
    if tupleFilter[0] <= LatLongList[0] <= tupleFilter[1] and
       tupleFilter[2] <= LatLongList[1] <= tupleFilter[3]:
        return True
    else:
        return False

def writeLine(self,key,line):
    filterDico[key][1].write(line)



def filteringProcess(dico):

    myFilter = Filter()

    while True:
        try:
            currentLine = readFile.readline()
        except ValueError:
            break
        if len(currentLine) ==0:                    # Breaks at the end of the file
            break
        if len(currentLine) < 35:                    # Deletes wrong lines (too short)
            continue
        LatLongList = myFilter.LatLong(currentLine)
        for key in dico:
            if myFilter.LatLongFilter(LatLongList,dico[key][0]):
                myFilter.writeLine(key,currentLine)


###########################################################################
                # Main
###########################################################################

# Open read files:
readFile = open(config.readFileList[LauncherCount.LauncherCount][1], 'r')

# Generate writing files:
pathDico = {}
filterDico = config.filterDico

# Create outputs
for key in filterDico:
    output_Name = config.readFileList[LauncherCount.LauncherCount][0][:-4] 
                  + '_' + key +'.log'
    pathDico[output_Name] = config.writingFolder + output_Name
    filterDico[key] = [filterDico[key],open(pathDico[output_Name],'w')]


p = []
CPUCount = multiprocessing.cpu_count()
CPURange = range(CPUCount)

startingTime = time.localtime()

if __name__ == '__main__':
    ### Create and start processes:
    for i in CPURange:
        p.append(multiprocessing.Process(target = filteringProcess , 
                                            args = (filterDico,)))
        p[i].start()

    ### Kill processes:
    while True:
        if [p[i].is_alive() for i in CPURange] == [False for i in CPURange]:
            readFile.close()
            for key in config.filterDico:
                config.filterDico[key][1].close()
                print(key,"is Done!")
                endTime = time.localtime()
            break

    print("Process started at:",startingTime)
    print("And ended at:",endTime)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T20:25:26+00:00Added an answer on May 28, 2026 at 8:25 pm

    To process groups of files in sequence while working on files within a group in parallel:

    #!/usr/bin/env python
    from multiprocessing import Pool
    
    def work_on(args):
        """Process a single file."""
        i, filename = args
        print("working on %s" % (filename,))
        return i
    
    def files():
        """Generate input filenames to work on."""
        #NOTE: you could read the file list from a file, get it using glob.glob, etc
        yield "inputfile1"
        yield "inputfile2"
    
    def process_files(pool, filenames):
        """Process filenames using pool of processes.
    
        Wait for results.
        """
        for result in pool.imap_unordered(work_on, enumerate(filenames)):
            #NOTE: in general the files won't be processed in the original order
            print(result) 
    
    def main():
       p = Pool()
    
       # to do "successive" multiprocessing
       for filenames in [files(), ['other', 'bunch', 'of', 'files']]:
           process_files(p, filenames)
    
    if __name__=="__main__":
       main()
    

    Each process_file() is called in sequence after the previous one has been complete i.e., the files from different calls to process_files() are not processed in parallel.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some HUGE log files (50Mb; ~500K lines) I need to start filtering
Logcat allows filtering logs but it works like that: You define filters and logcat
I am using Powershell for some ETL work, reading compressed text files in and
Filtering QuerySets in Django work like the following: Entry.objects.filter(year=2006) How can I use filter
I am filtering the HKEYS by using Hook filtering function, I use the following
I have code similar to this filtering entries in an Array of Objects: var
I'm filtering some content on my website via country specific code, I'm trying to
Wordpress has a spam filtering plugin called Akismet that seems to be able to
I create a filtering select like so: var lensMapServiceFS = new dijit.form.FilteringSelect({ displayedValue: this.layerNames[0],
I'm using the ' Filtering Blocks ' tutorial on the CSS-Tricks website which allows

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.