I have a fairly large text file that I would like to convert into a SequenceFile. Unfortunately, the file consists of Python code with logical lines running over several physical lines. For example,
print “Blah Blah\
… blah blah”
Each logical line is terminated by a NEWLINE. Could someone clarify how I could possibly generate Key, Value pairs in Map-Reduce where each Value is the entire logical line?
I have a fairly large text file that I would like to convert into
Share
You should create your own variation on TextInputFormat. In there you make a new RecordReader that skips lines until it sees the start of a logical line.