I am trying to run the Map reduce WordCount job on a text file

Question

0

Asked: June 17, 20262026-06-17T16:14:21+00:00 2026-06-17T16:14:21+00:00

I am trying to run the Map reduce WordCount job on a text file

0

I am trying to run the Map reduce WordCount job on a text file that I have stored in my bucket on Amazon s3. I have set up all the required the required authentication for the map reduce framework to communicate with Amazon, but I keep on running with this error. Any idea why this is happening?

13/01/20 13:22:15 ERROR security.UserGroupInformation:
PriviledgedActionException as:root
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist: s3://name-bucket/test.txt
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: s3://name-bucket/test.txt
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
    at org.myorg.WordCount.main(WordCount.java:55)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T16:14:22+00:00

You actually have to replace the protocol s3 by s3n. These are 2 different filesystems with different properties:

s3n is the s3 Native Filesystem: A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3. For this reason it is not suitable as a replacement for HDFS (which has support for very large files).
s3 is the Block filesystem: A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem – you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.

(source)

In your case your bucket is probably using s3n filesystem, I believe it’s the default, most of the buckets I use are also s3n. So you should use s3n://name-bucket/test.txt

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to run the Map reduce WordCount job on a text file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply