I’m using HBase through Java API to manage a list of URLs and parameters waiting to be analyzed by a scraper with multiple threads. The program also continuously adds new rows to the table.
I need to continuously read one and only one row from a table and delete it, atomically (a row mustn’t be read by two threads at the same time), without criteria to select one row instead than another.
Currently, I create a Scanner and retrieve the first result:
Scan s = new Scan();
ss = t.getScanner(s);
for(Result r:ss){
String ris=Bytes.toString(r.getRow())+Bytes.toString(r.getValue(Bytes.toBytes("TTL"),Bytes.toBytes("value")));
//delete the retrieved row
t.delete(new Delete(r.getRow()));
ss.close();
//return at the first iteration, after closing the Scanner
return ris;
}
But the program is ultra-slow (that is, 10 or more seconds to get a value), and I have no idea about how to do it nicely. How can I “consume” a single row in HBase?
EDIT: I forgot to mention that the table has a single column family with a single column
You can try to use scan with filters. I think that PageFilter is suitable for your purposes.