- I want to write two different types of output from the same reducer, into two different directories.
I am able to use multipleoutputs feature in hadoop to write to different files, but they both go to the same output folder.
I want to write each file from the same reduce to a different folder.
Is there a way for doing this?
If I try putting for example “hello/testfile”, as the second argument, it shows invaid argument. So I m not able to write to different folders.
- If the above case is not possible, the is it possible for the mapper to read only specific files from an input folder?
Please help me.
Thanks in advance!
Thanks for the reply. I am able to read a file successfully using then above method. But in distributed mode, I am not able to do so. In the reducer, I have
set:
mos.getCollector("data", reporter).collect(new Text(str_key), new Text(str_val));
(Using multiple outputs, and in Job Conf:
I tried using
FileInputFormat.setInputPaths(conf2, "/home/users/mlakshm/opchk285/data-r-00000*");
as well as
FileInputFormat.setInputPaths(conf2, "/home/users/mlakshm/opchk285/data*");
But, it gives the following error:
cause:org.apache.hadoop.mapred.InvalidInputException: Input Pattern hdfs://mentat.cluster:54310/home/users/mlakshm/opchk295/data-r-00000* matches 0 files
Copy the MultipleOutputs code into your code base and loosen the restriction on allowable characters. I can’t see any valid reason for the restrictions anyway.