Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7573569
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T16:12:56+00:00 2026-05-30T16:12:56+00:00

I need to run a function for the each of the elements of my

  • 0

I need to run a function for the each of the elements of my database.

When I try the following:

from multiprocessing import Pool
from pymongo import Connection

def foo():
...


connection1 = Connection('127.0.0.1', 27017)
db1 = connection1.data

my_pool = Pool(6)
my_pool.map(foo, db1.index.find())

I’m getting the following error:

Job 1, ‘python myscript.py ‘ terminated by signal SIGKILL (Forced quit)

Which is, I think, caused by db1.index.find() eating all the available ram while trying to return millions of database elements…

How should I modify my code for it to work?

Some logs are here:

dmesg | tail -500 | grep memory
[177886.768927] Out of memory: Kill process 3063 (python) score 683 or sacrifice child
[177891.001379]  [<ffffffff8110e51a>] out_of_memory+0xfa/0x250
[177891.021362] Out of memory: Kill process 3063 (python) score 684 or sacrifice child
[177891.025399]  [<ffffffff8110e51a>] out_of_memory+0xfa/0x250

The actual function below:

def create_barrel(item):
    connection = Connection('127.0.0.1', 27017)
    db = connection.data
    print db.index.count()
    barrel = []
    fls = []
    if 'name' in item.keys():
        barrel.append(WhitespaceTokenizer().tokenize(item['name']))
        name = item['name']
    elif 'name.utf-8' in item.keys():
        barrel.append(WhitespaceTokenizer().tokenize(item['name.utf-8']))
        name = item['name.utf-8']
    else:
        print item.keys()
    if 'files' in item.keys():
        for file in item['files']:
            if 'path' in file.keys():
                barrel.append(WhitespaceTokenizer().tokenize(" ".join(file['path'])))
                fls.append(("\\".join(file['path']),file['length']))
            elif 'path.utf-8'  in file.keys():
                barrel.append(WhitespaceTokenizer().tokenize(" ".join(file['path.utf-8'])))
                fls.append(("\\".join(file['path.utf-8']),file['length']))
            else:
                print file
                barrel.append(WhitespaceTokenizer().tokenize(file))
    if len(fls) < 1:
        fls.append((name,item['length']))
    barrel = sum(barrel,[])
    for s in barrel:
        vs = re.findall("\d[\d|\.]*\d", s) #versions i.e. numbes such as 4.2.7500 
    b0 = []
    for s in barrel:
        b0.append(re.split("[" + string.punctuation + "]", s))
    b1 = filter(lambda x: x not in string.punctuation, sum(b0,[]))
    flag = True
    while flag:
        bb = []
        flag = False
        for bt in b1:
            if bt[0] in string.punctuation:
                bb.append(bt[1:])
                flag = True
            elif bt[-1] in string.punctuation:
                bb.append(bt[:-1])
                flag = True
            else:
                bb.append(bt)
        b1 = bb
    b2 = b1 + barrel + vs
    b3 = list(set(b2))
    b4 = map(lambda x: x.lower(), b3)
    b_final = {}
    b_final['_id'] = item['_id']
    b_final['tags'] = b4
    b_final['name'] = name
    b_final['files'] = fls
    print db.barrels.insert(b_final)

I’ve noticed interesting thing. Then I press ctrl+c to stop process I’m getting the following:

python index2barrel.py 
Traceback (most recent call last):
  File "index2barrel.py", line 83, in <module>
    my_pool.map(create_barrel, db1.index.find, 6)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 280, in map_async
    iterable = list(iterable)
TypeError: 'instancemethod' object is not iterable

I mean, why multiprocessing is trying to convert somethin to the list? Isn’t it the source of the problem?

from the stack trace:

brk(0x231ccf000)                        = 0x231ccf000
futex(0x1abb150, FUTEX_WAKE_PRIVATE, 1) = 1
sendto(3, "+\0\0\0\260\263\355\356\0\0\0\0\325\7\0\0\0\0\0\0data.index\0\0"..., 43, 0, NULL, 0) = 43
recvfrom(3, "Some text from my database."..., 491663, 0, NULL, NULL) = 491663
... [manymany times]
brk(0x2320d5000)                        = 0x2320d5000
.... manymany times

The above sample goes and goes in strace output and for some reason strace -o logfile python myscript.py
does not halt. It just eats all the available ram and writes in log file.

UPDATE. Using imap instead of map solved my problem.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T16:12:57+00:00Added an answer on May 30, 2026 at 4:12 pm

    Since the find() operation is returning the cursor the the map function and since you say that this runs without a problem when you do
    for item in db1.index.find(): create_barrel(item)
    it looks like the create_barrel function is OK.

    Can you try to limit the number of results returned in the cursor and see if this helps? I think the syntax would be:

    db1.index.find().limit(100)
    

    If you could try this and see if it helps it might help to get the cause of the problem.

    EDIT1: I think you are going about this the wrong way by using the map function – I think you should be using map_reduce in the mongo python driver – that way the map function will be executed by the mongod process.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to run a javascript function each 10 seconds. I understand the syntax
Well, title self describes it.. I need to run a sql function to clean
I have a Flex client application. I need a clean up function to run
I'd like to run a preg_replace PHP function on a string but I need
i need run code that will create a database and populate tables. i am
I need run ts:reindex when smth add in model or destroy from model. How
I have a JavaScript function that I need to run. It takes some parameters.
I need to run the application that I have made in Xcode on my
I need to run two ffmpeg commands, one after the other i.e., wait until
I need to run application in every X seconds, so, as far as cron

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.