A sqoop query generates a java file that contains a class that contains the code to get access in mapreduce to the columns data for each row. (the Sqoop import was done in text without the –as-sequencefile option, and with 1 line per record and commas between the columns)
But how do we actually use it?
I found a public method parse() in this class that takes Text as an input and populates all the members of the class , so to practice I modified the wordcount application to convert a line of text from the TextInputFormat in the mapper into an instnace of the class generated by sqoop. But that causes an “unreported exception.com.cloudera.sqoop.lib.RecordParser.ParseError; must be caught or declared to be thrown” when I call the parse() method.
Can it be done this way or is a custom InputFormat necessary to populate the class with the data from each record ?
Ok this seems obvious once you find out but as a java beginner this can take time.
First configure your project:
just add the sqoop generated .java file in your source folder.
I use eclipse to import it in my class source folder.
Then just make sure you configured your project’s java build path correctly:
Add the following jar files in the project’s properties/java build path/libraries/add external jar:
(for hadoop cdh4+) :
Then adapt your mapreduce source code:
First configure it:
Now configure your mapper class.
Let’s assume your sqoop imported java file is called Sqimp.java:
and the table you imported had the following columns: id, name, age
your mapper class should look like this: