Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8838359
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T09:58:28+00:00 2026-06-14T09:58:28+00:00

I need some help on a parallel processing task that I am trying to

  • 0

I need some help on a parallel processing task that I am trying to complete asap.

It simply involves splitting a largeish dataframe into smaller chunks and running the same script on each chunk.

I think this is called embarassingly parallel.

I would be very grateful if there’s someone out there who could suggest a template to achieve this task using either amazon cloud services or picloud.

I have made initial forays into amazon ec2 and picloud (the script I will run on each data chunk is in python) but realise that I may
not figure out how to do it in either without some help.

So, any pointers would be greatly appreciated. I’m just looking for basic help (to those in the know), such as the main steps involved in setting up parallel cores or cpus using either ec2 or picloud or whatever, running the script in parallel, and saving the script output i.e. the script writes the result of its calculation to a csv file.

i’m running ubuntu 12.04, my python 2.7 script doesnt involve non-stand libraries, just os and csv. the script isn’t complex, just the data is too big for my machine and timeframe.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T09:58:29+00:00Added an answer on June 14, 2026 at 9:58 am

    This script uses the cloud library for Python from PiCloud, and should be run locally.

    # chunks is a list of filenames (you'll need to define generate_chunk_files)
    chunks = generate_chunk_files('large_dataframe')
    for chunk in chunks:
        # stores each chunk in your PiCloud bucket
        cloud.bucket.put(chunk)
    
    def process_chunk(chunk):
        """Runs on PiCloud"""
    
        # saves chunk object locally
        cloud.bucket.get(chunk)
        f = open(chunk, 'r')
        # process the data however you want
    
    # asynchronously runs process_chunk on the cloud for all chunks
    job_ids = cloud.map(process_chunk, chunks)
    

    Use the Realtime Cores feature to allocate a specific number of cores.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

need some help! I'm trying to write some code in objective-c that requires part-of-speech
Need some help assigning a mouseover event to display some icons that start out
Need some help with DataFormatString in GridView. I have a Double value that needs
I need some help. Right now i have done a file search that will
Need some help with what is probably a pretty basic SQL query. I'm trying
I need your some help... How to create parallel output from many pages in
I desperately need some help on this one. I've created a <script> that closely
Need some help, I have a regular expression that appears to work just fine
Need some help refactoring this if/else block that builds the conditions for a find
I need some class or lib that would allow me to run multiple parallel

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.