I have a database with 10,000 adam_id‘s. For each adam_id, I need to pull down information through an API.
My table looks like this:
`title`
- adam_id
- success (boolean)
- number_of_tries (# of times success=0 when trying to do the pull down)
Here is the function I would like to create:
def pull_down(cursor):
work_remains = True
while work_remains:
cursor.execute("""SELECT adam_id FROM title WHERE success=0
AND number_of_tries < 5 ORDR BY adam_id LIMIT 1""")
if len(cursor.fetchall()):
adam_id = cursor.fetchone()[0]
do_api_call(adam_id)
else:
work_remains = False
def do_api_call(adam_id):
# do api call
if success:
cursor.execute("UPDATE title SET success=1 WHERE adam_id = adam_id")
else:
cursor.execute("UPDATE title SET number_of_tries+=1 WHERE adam_id=adam_id")
How would I do the above with n workers using python’s multiprocessing functionality instead of doing it with one synchronous process? I have begun looking over the Multiprocessing module ( http://docs.python.org/library/multiprocessing.html ), but it seems pretty hard to digest for me thus far.
If the heavy part of the work is the api call, because it goes to an outside resource, then that would be the only part you really would want to make parallel. The database calls are probably really fast. So you might try this:
adam_idvalues in one queryThis is a rough pseudocode example to show the logic flow:
It would only complicate the code if you tried to make the database calls parallel, because then you have to properly give each process it’s own connection, when really it wouldn’t be the database slowing you down anyways.