I am writing a Firefox extension. I would like to search the current webpage

Question

0

Asked: May 10, 20262026-05-10T21:40:55+00:00 2026-05-10T21:40:55+00:00

I am writing a Firefox extension. I would like to search the current webpage

0

I am writing a Firefox extension. I would like to search the current webpage for a set of words, and count how many times each occurs. This activity is only performed when the user asks, but it must still happen reasonably quickly.

I am currently using indexOf on the BODY tag’s innerHTML element, but am finding it too slow to run repeatedly in the following manner:

function wordcount(doc, match) {   var count = 0;   var pos = 0;   for(;;)   {     len=doc.indexOf(match, pos);     if(len == -1)     {       break;     }     pos = len + match.length;     count++;   }   return count; }  var html = content.document.body.innerHTML.toLowerCase()  for(var i=0; i<keywords.length; i++) {   var kw = keywords[i];   myDump(kw + ': ' + wordcount(html, kw)); }

With 100 keywords, this takes approximately 10 to 20 seconds to run. There is some scope to reduce the number of keywords, but it will still need to run much quicker.

Is there a more obvious way to do this? What is the most efficient method? I have some ideas, but am reluctant to code each up without some idea of the performance I can expect:

Navigate the DOM rather than using innerHTML. Will this be likely quicker or slower? It would have the benefit of only searching textual content.
Loop through the document word by word, accumulating a count of each word’s occurence simultaneously. With this method I would have to do a bit more work parsing the HTML.

Edit: Turns out that the slowest part was the myDump function writing to the error console. Duh! Nevertheless, there some interesting more efficient alternatives have been presented, which I am intending to use.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T21:40:56+00:00

I’m not sure if it is the fastest but the following worked pretty quickly for me.

var words = document.body.innerHTML.replace(/<.*?>/g,'').split(/\s+/); var i = words.length; var keywordCounts = {'keyword': 0, 'javascript': 0, 'today': 0}; var keywords = []; var keywordMatcher = ''; var word; for (word in keywordCounts) {     keywords[keywords.length] = word ;     keywordMatcher = keywordMatcher + '(' + word + ')?'; } var regex = new RegExp(keywordMatcher); var j = keywords.length; var matched, keyword; if (i && j) {     do {         i = i - 1;         matched = words[i].match(regex);         if (!matched) continue;         j = keywords.length;         do {             j = j - 1;             if (matched[j + 1]) {                 keyword = keywords[j];                 keywordCounts[keyword] = keywordCounts[keyword] + 1;             }         } while (j);     } while (i); }

I’ll definitely grant that from a Big(O) perspective it isn’t the best because as i and j get big it still requires n squared time but I’ve found regular expression processing to generally be pretty fast.

Basically I’m taking tvanfosson’s idea and expanding on it, but rather than traversing the DOM I’m removing the tags with a regex (the first line) and then splitting the page into individual words. The keyword ‘hash’ is defined on the third line with initial counts (they should all start at zero obviously). From there I a new regular expression is constructed using each keyword as a group so when matched it returns an array of results that has (in my example) [fullMatch,keywordMatch,javascriptMatch,todayMatch]. I’m using decrementing do while loops because they’ve been shown in lots of places to be the fastest looping structure in JavaScript and since it doesn’t matter in what order the words get processed loop speed is really the only consideration.

I hope this is helpful, if not it was at least a fun exercise. 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a Firefox extension. I would like to search the current webpage

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply