I know that we can call a map-reduce job from a normal java application. Now the map-reduce jobs in my case has to deal with files on hdfs and also files on other filesystem. Is it possible in hadoop that we can access files from other file system while simultaneously using the files on hdfs. Is that possible ?
So basically my intention is that I have one large file which I want to put it in HDFS for parallel computing and then compare the blocks of this file with some other files(which I do not want to put in HDFS coz they need to be accessed as full length file at once.
You can use the distributed cache to distribute the files to your mappers, they can open and read the files in their
configure()method (don’t read them inmap()as it will be called many times.)edit
In order to access file from the local filesystem in your map reduce job, you can add those files to the distributed cache when you setup your job configuration.
The MapReduce framework will make sure those files are accessible by your mappers.
and remove the files when your job is done.