Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7584953
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T19:03:08+00:00 2026-05-30T19:03:08+00:00

I have a shell function that is called from inside my map function. The

  • 0

I have a shell function that is called from inside my map function. The shell function takes 2 parameters -> an input file and an output file. Something like this

$> unix-binary /pathin/input.txt /pathout/output.txt

The problem is, that these input.txt files reside in HDFS and the output.txt files need to be written back to HDFS. Currently, I first copy the needed file with fs.copyToLocalFile into the local hard drive, call the unix binary and then write the output.txt back to HDFS with fs.copyFromLocalFile.
The problem with this approach is that, it is not optimal because it involves substantial amount of redundant reading and writing to HDD which slows down the performance. So, my question is, how I can read the HDFS file directly as an input and output the results directly to HDFS?
obviously,
$>unix-binary hdfs://master:53410/pathin/input.txt' hdfs://master:54310/pathout/output.txt
will not work. Is there any other way around? Can I treat an HDFS file as a loacl file somehow?
I have access to the unix-binary source code written in C. Maybe changing the source code would help?
thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T19:03:09+00:00Added an answer on May 30, 2026 at 7:03 pm

    You can add the file to the DistributedCache and access it from the mapper from the cache. Call your shell function on the local file and write the output file to local disk and then copy the local file to HDFS.

    However, operations such as calling shell functions, or reading/writing from within a mapper/reducer break the MapReduce paradigm. If you find yourself needing to perform such operations, MapReduce may not be the solution you’re looking for. HDFS and MapReduce were designed to perform massive scale batch processing on small numbers of extremely large files.

    Since you have access to unix-binary source code, your best option might be to implement the particular function(s) you want in java. Feed the input files to your mapper and call the function from the mapper on the data rather than working with files on HDFS/LocalFS.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following python function that allows me to run shell commands from
I have an auto-generated file each day that gets called by a shell script.
I have a python function that makes a subprocess call to a shell script
I have a shell script file (run.sh) that contains the following: #!/bin/bash %JAVA_HOME%/bin/java -jar
Good day, I have a local csv file with values that change daily called
I have an unmanaged DLL that gets called from .NET with pre-allocated buffers to
I have a function that looks like this: def getCurrentShow(hour=(localtime().tm_hour),day=datetime.datetime.now().strftime('%A')): return Schedule.objects.get(hour=hour,day=day).show so I
Let say I have this code in a JS file called plus2.js: function plus2(n){
I have a clojure program that at some point executes a function called db-rebuild-files-table
I have a Bourne Shell script that has several functions in it, and allows

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.