I’m building a node.js application and storing a six-digit base36 representation of a unix timestamp (in seconds) as the first part of an _id in Mongodb. A typical _id looks like this:
"_id" : "lwhlzy/czwszasfgr/a4d18976c1/f835caa1c3/184d06b47f"
Several pieces of data are concatenated, including the timestamp followed by a series of hashed data to form both a GUID and a “materialized path“
Later queries will select records based on a time range, followed by the path to get events that happened during that period for that particular path. These queries will rely on rooted regular expressions, so I need a regex that can find a range of base36 numbers:
This is the code I have so far (a test to run via node and yes it is hard-coded to six digits. The seventh digit wont be needed until Dec 23rd 2038.)
var base36 = "0123456789abcdefghijklmnopqrstuvwxyz";
// determine how many left-most characters from & to have in common
// this function works nicely, no problems here
var getOverlap = function (from, to) {
regex = '';
count = to.length;
for (i in to) {
regex += (i>0?'|':'')+'('+to.slice(0,count)+')';
count--;
}
result = from.match(RegExp(regex,"ig"));
return result[0];
};
var from = "lec0s0";
var to = "lwhvqg"; // generated from: parseInt(Date.now()/1000,10).toString(36)
var overlap = getOverlap(from,to);
console.log(from);
console.log(to);
var regex = overlap;
var i = overlap.length;
// start immediately after the left-most common characters and append the rest of the regex
while (i<6) {
regex += "[";
if (from[i] < to[i]) {
regex += base36.slice(base36.indexOf(from[i]), base36.indexOf(to[i])+1);
} else {
regex += base36.slice(base36.indexOf(from[i])) + base36.slice(0, base36.indexOf(to[i])+1);
}
regex += "]";
i++;
}
console.log(regex);
process.exit();
Which will output something like this:
l[efghijklmnopqrstuvw][cdefgh][0123456789abcdefghijklmnopqrstuv][stuvwxyz0123456789abcdefghijklmnopq][0123456789abcdefg]
After studying this I realized there are two main issues with this: 1) its not quite right for a true range (it would skip huge chunks of records) and 2) Id rather have character ranges like [e-w] instead of every character explicitly stated though it still works.
For input from="lec0s0" and to="lwhvqg" I realize Im missing a large part of this regex. For example, the code above only allows the 3rd character a range from c-h, but that position will need to reach “z” before the 2nd character can increment. I’ve determined that I actually need a regex that looks more like this:
l[e-v][0-9a-z][0-9a-z][0-9a-z][0-9a-z]|l[e-w][c-g][0-9a-z][0-9a-z][0-9a-z]|l[e-w][c-h][0-9a-u][0-9a-z][0-9a-z]|l[e-w][c-h][0-9a-v][0-9a-o][0-9a-z]|l[e-w][c-h][0-9a-v][0-9a-q][0-9a-g]
So my question is: am I right to conclude the regex needs to look like the latter above? And if so, how might I modify the code to generate it?
Thanks in advance!
Your current pattern will match from
le0000and up, you actually wish to match:The following function should give you the regex you need:
You can see it live here: http://jsfiddle.net/3cu52/3/