I am running a map reduce job. When I run it on my machine which is a single node cluster the output is as shown
hduser@nikhil-VirtualBox:/usr/local/hadoop/hadoop-1.0.4$ bin/hadoop dfs -text /user/hduser/output16/part-r-00000
0 Required Genotype column (s), Must not contain NULLS for required fields, failed, 5, 1: GENE_NAME; 2: GENE_NAME; 4: GENE_NAME; 5: GENE_NAME; 9: GENE_NAME
However when I run the same on Amazon EMR on a larger dataset, I get the following with all weird characters. What might be the reason ?
SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text\00\00\00\00\00\00\968\D6\FA\E1>X(.q\8B!\ABQ\00\00-\00\00\00
1537044153\8ERequired Genotype column (s), Must not contain NULLS for required fields, failed, 1, 1: VARIANT_START_POSITION; 2: VARIANT_START_POSITION;
The header (SEQTextText) tells you that this is a
SequenceFilewith aorg.apache.hadoop.io.Textas key and value.So this is binary and not plain text and you can read it with a
SequenceFile.Reader.