I’m trying to place a file in the distributed cache. In order to do this I invoke my driver class using the -files option, something like:
hadoop jar job.jar my.driver.class -files MYFILE input output
The getCacheFiles() and the getLocalCacheFiles() return arrays of URIs/Paths containing MYFILE.
(E.g.: hdfs://localhost/tmp/hadoopuser/mapred/staging/knappy/.staging/job_201208262359_0005/files/histfile#histfile)
Unfortunately, when trying to retrieve MYFILE in the map task, it throws a FileNotFoundException.
I tried this in standalone(local) mode as well as in pseudo-distributed mode.
Do you know what might be the cause ?
UPDATE:
The following three lines:
System.out.println("cache files:"+ctx.getConfiguration().get("mapred.cache.files"));
uris = DistributedCache.getLocalCacheFiles(ctx.getConfiguration());
for(Path uri: uris){
System.out.println(uri.toString());
System.out.println(uri.getName());
if(uri.getName().contains(Constants.PATH_TO_HISTFILE)){
histfileName = uri.getName();
}
}
print out this:
cache files:file:/home/knappy/histfile#histfile
/tmp/hadoop-knappy/mapred/local/archive/-7231_-1351_105/file/home/knappy/histfile
histfile
So, the file seems to be listed in the job.xml mapred.cache.files property and the local file seems to be present. Still, the FileNotFoundException is thrown.
First check
mapred.cache.filesin your job’s xml to see whether the file is in the cache.The you can retrieve it in your mapper: