Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3694824
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T04:32:35+00:00 2026-05-19T04:32:35+00:00

I am working with Amazons MapReduce Web Service for an university project. In order

  • 0

I am working with Amazons MapReduce Web Service for an university project. In order to use the data for MapReduce, I need to dump them from a relational database (AWS RDS) into S3. After MapReduce finishes I need to split the output file and load chunks of it into their own S3 buckets.

What is a good way to do this within the Amazon Web Services Enviroment?

Best case: Could this be a accomplished without using extra EC2 instances besides the ones used for RDS and MapReduce?

I use python for the mapper and reducer functions and json specifiers for the MapReduce job-flow. Otherwise I am not language or technology bound.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T04:32:35+00:00Added an answer on May 19, 2026 at 4:32 am

    If you take a look at the Amazon Elastic MapReduce Developer Guide you need to specify the location of input data, output data, mapper script and reducer script in S3 in order to create a MapReduce job flow.

    If you need to do some pre-processing (such as dumping the MapReduce input file from a database) or post-processing (such as splitting the MapReduce output file to other locations in S3), you will have to automate those tasks separately from the MapReduce job flow.

    You may use the boto library to write those pre-processing and post-processing scripts. They can be run on an EC2 instance or any other computer with access to the S3 bucket. Data transfer from EC2 may be cheaper and faster, but if you don’t have an EC2 instance available for this, you could run the scripts in your own computer… unless there is too much data to transfer!

    You can go as far as you want with automation: You may even orchestrate the whole process of generating input, launching a new MapReduce job flow, waiting for the job to finish and processing output accordingly, so that given the proper configuration, the whole thing is reduced to pushing a button 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working on running a Memory/CPU intensive project on a cloud service, from my
I am working for a scientific project and would like to use the Amazon
I am a university student working on a research project that involves migrating a
I'm having trouble making a connection to Amazons RDS service from my Codeigniter application.
I'm working on a small project to get myself acquainted with the Amazon web
Working on a rather small, and simple layout, I decided to use Meyer's CSS
Working with Linq2Sql as a driver for a Wcf Service. Lets go bottom up....
I'm trying to do some data analysis on Amazon Elastic MapReduce. The mapper step
i have this working code to delete files and folders from s3. how would
we are working on a web application which is developed using Struts 2 Framework.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.