I am working on a use case where I generate random data using a map reduce program and I do not require any input file in HDFS. If I don’t give input path MR program doesn’t work. So, currently I have a dummy input file. Is there any way to avoid this?
Share
Usually MR programs have some sort of data for processing. But, there might be scenarios like Random Generation where is there is no data to be processed. Checkout the TeraGen program for the random number generation which takes number of rows and the output directory as input. Also, I haven’t tried the DataGenerator, but it seems interesting.