I would like to implement a TreeWritable class to represent a Tree structure.
I have tried the following implementation but I’m getting a mapred.MapTask: Record too large for in-memory buffer error.
How should I implement a Writable for a multi-level data structure?
public class TreeWritable implements Writable
{
private final Set<TreeWritable> children = new LinkedHashSet<TreeWritable>();
private String data;
private int level;
public TreeWritable( String data, int level )
{
this.data = data;
this.level = level;
}
public int getLevel()
{
return level;
}
public TreeWritable()
{
}
public TreeWritable child( String data )
{
for ( TreeWritable child : children )
{
if ( child.data.equals( data ) )
{
return child;
}
}
return child( new TreeWritable( data, this.level + 1 ) );
}
TreeWritable child( TreeWritable child )
{
children.add( child );
return child;
}
public Set<TreeWritable> getChildren()
{
return children;
}
public String getId()
{
return data;
}
public void write( DataOutput out ) throws IOException
{
out.writeUTF( data );
out.write( level );
int size = children.size();
out.writeInt( size );
while(children.iterator().hasNext())
children.iterator().next().write( out );
}
public void readFields( DataInput in ) throws IOException
{
data = in.readUTF();
level = in.readInt();
int size = in.readInt();
for ( int i = 0; i < size; i++ )
children.add( TreeWritable.read( in ) );
}
public static TreeWritable read( DataInput in ) throws IOException
{
TreeWritable w = new TreeWritable();
w.readFields( in );
return w;
}
}
I think it is optimal implementation of small trees if it is mean to process it on single machine. If you are working with large trees you should split it to tree parts, and store it as tuples (id, data, root_id) for example.
Another example, data structure for PageRank evaluation in MapReduce is (url, currentPageRank, [link_url1, link_url2, …])