I’m looking for a function that finds a substring from an array of strings (‘needles’) in a longer string (‘haystack’). Basically I want it to work like this example:
var haystack = "abcdefghijklmnopqrstuvwxyz";
var needles = [
'bcd',
'pqr',
'hi',
'ghi',
'g',
'stuv'
];
var output = findSubstring (haystack, needles, 2, 20);
Output should now have:
{index: 6, which: 3}
which means it found “ghi” (needle 3) at position 6. It gets ‘ghi’ rather than ‘hi’, because ‘ghi’ starts earlier in the haystack, but it doesn’t get ‘g’ because ‘ghi’ is earlier in the needles array.
This is the best that I have come up with, but it seems rather slow on very large chunks of text and very large needle arrays (which is what I am using it on), and I’m sure there is something better. It is pretty performance critical stuff so I’d really like something faster.
I could imagine better ways to do it (probably not using indexOf), and since this is (presumably) a pretty common sort of thing to do, someone with more experience with this sort of thing might have a better way to go about it. (i.e. I’d rather not reinvent the wheel)
function findSubstring (haystack, needles, startIndex, endIndex) {
var min = Infinity, best = -1;
var numNeedles = needles.length;
if (!startIndex)
startIndex = 0;
for (var i=0; i<numNeedles; i++) {
var index = haystack.indexOf(needles[i], startIndex);
if (index != -1 && index < min) {
min = index;
best = i;
}
}
return (best == -1 || (endIndex && best >= endIndex)) ?
null :
{index: min, which: best};
}
Suggest combining your needles into a single regex: “bcd|pqr|hi|ghi|g|stuv”.
The regular expression engine will combine those into a single, efficient finite state machine.