I am trying to run a MapReduce job on my cluster that only runs

Question

0

Asked: June 3, 20262026-06-03T02:37:45+00:00 2026-06-03T02:37:45+00:00

I am trying to run a MapReduce job on my cluster that only runs

0

I am trying to run a MapReduce job on my cluster that only runs on a specific file extension. We have a bunch of heterogeneous data that sits on the cluster and for this particular job I only want to execute on .jpg. Is there a way this can be done without restricting it in the mapper. It seems like this should be something easy to do when you execute the job. I’m thinking something like hadoop fs JobName /users/myuser/data/*.jpg /users/myuser/output.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T02:37:47+00:00

Editorial Team

2026-06-03T02:37:47+00:00Added an answer on June 3, 2026 at 2:37 am

Your example should work as written, but you’ll want to check with the input format that you’re calling the setInputPaths(Job, String) method, as this will resolve the glob string “/users/myuser/data/*.jpg” into the individual jpg files in /users/myuser/data.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to run a MapReduce job on my cluster that only runs

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply