For my map reduce job, I’m reading lines in my input file to get external file paths. So my file that I’m using as input looks like:
/user/local/myfiles/temp1.png
/user/local/myfiles/temp2.jpg
/user/local/myfiles/temp3.txt
/user/local/myfiles/temp4.txt
....
And I want to perform some operation on those files. I need to grab the file object from the string path I read in my map function. My question is: where do I put the actual copy of those files so I can grab them? Do I put them on the hadoop dfs? When I put them on the local system, I get a file not found error but I get the same error when I put them on the hadoop file system (so every line in the input file is something like “/user/hadoop/input/temp1.txt”). I can get the file name, but I need to be able to get the image object or text file object from the path listed in the input file. Is there some way I can access a file on the dfs (or local system) from my map function given just a string path?
You need to add them to the HDFS so that they are accessible from all Mappers. The following works for me (on 0.20):
and I add the Constants.INFILE in the driver, in order not to hardcode the filenames into the code.