I’ve used scans over data stored in Accumulo before, and have gotten the whole result set back (whatever Range I specified). The problem is, I would like to filter those on the server-side from Accumulo before the client receives them. I’m hoping someone has a simple code example of how this is done.
From my understanding, Filter provides some (all?) of this functionality, but how is this used in practice using the API? I see an example using Filter on the shell client, from the Accumulo documentation here: http://accumulo.apache.org/user_manual_1.3-incubating/examples/filter.html
I couldn’t find any code examples online of a simple way to filter a scan based on regular expressions over any of the data, although I’m thinking this should be something relatively easy to do.
The
Filterclass lays the framework for the functionality you want. To create a custom filter, you need to extendFilterand implement theaccept(Key k, Value v)method. If you are only looking to filter based on regular expressions, you can avoid writing your own filter by usingRegExFilter.Using a
RegExFilteris straightforward. Here is an example:The first two parameters of the
iteratorSettingconstructor (priority and name) are not relevant in this case. Once you’ve added the above code, iterating through the scanner will only return key/value pairs that match the regex parameters.