Why would the below eliminate the whitespace around matched keyword text when replacing it

Question

0

Asked: May 15, 20262026-05-15T03:18:45+00:00 2026-05-15T03:18:45+00:00

Why would the below eliminate the whitespace around matched keyword text when replacing it

0

Why would the below eliminate the whitespace around matched keyword text when replacing it with an anchor link? Note, this error only occurs in Chrome, and not firefox.

For complete context, the file is located at: http://seox.org/lbp/lb-core.js

To view the code in action (no errors found yet), the demo page is at http://seox.org/test.html. Copy/Pasting the first paragraph into a rich text editor (ie: dreamweaver, or gmail with rich text editor turned on) will reveal the problem, with words bunched together. Pasting it into a plain text editor will not.

// Find page text (not in links) -> doxdesk.com
function findPlainTextExceptInLinks(element, substring, callback) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType===1) {
            if (child.tagName.toLowerCase()!=='a')
                findPlainTextExceptInLinks(child, substring, callback);
        } else if (child.nodeType===3) {
            var index= child.data.length;
            while (true) {
                index= child.data.lastIndexOf(substring, index);
                if (index===-1 || limit.indexOf(substring.toLowerCase()) !== -1)
                    break;
                // don't match an alphanumeric char
                var dontMatch =/\w/;
                if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
                    break;
                // alert(child.nodeValue.charAt(index+keyword.length + 1));
                callback.call(window, child, index)
            }
        }
    }
}

// Linkup function, call with various type cases (below)
function linkup(node, index) {

    node.splitText(index+keyword.length);
    var a= document.createElement('a');
    a.href= linkUrl;
    a.appendChild(node.splitText(index));
    node.parentNode.insertBefore(a, node.nextSibling);
    limit.push(keyword.toLowerCase()); // Add the keyword to memory
    urlMemory.push(linkUrl); // Add the url to memory
}

// lower case (already applied)
findPlainTextExceptInLinks(lbp.vrs.holder, keyword, linkup);

Thanks in advance for your help. I’m nearly ready to launch the script, and will gladly comment in kudos to you for your assistance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T03:18:46+00:00

It’s not anything to do with the linking functionality; it happens to copied links that are already on the page too, and the credit content, even if the processSel() call is commented out.

It seems to be a weird bug in Chrome’s rich text copy function. The content in the holder is fine; if you cloneContents the selected range and alert its innerHTML at the end, the whitespaces are clearly there. But whitespaces just before, just after, and at the inner edges of any inline element (not just links!) don’t show up in rich text.

Even if you add new text nodes to the DOM containing spaces next to a link, Chrome swallows them. I was able to make it look right by inserting non-breaking spaces:

var links= lbp.vrs.holder.getElementsByTagName('a');
for (var i= links.length; i-->0;) {
    links[i].parentNode.insertBefore(document.createTextNode('\xA0 '), links[i]);
    links[i].parentNode.insertBefore(document.createTextNode(' \xA0), links[i].nextSibling);
}

but that’s pretty ugly, should be unnecessary, and doesn’t fix up other inline elements. Bad Chrome!

var keyword = links[i].innerHTML.toLowerCase();

It’s unwise to rely on innerHTML to get text from an element, as the browser may escape or not-escape characters in it. Most notably &, but there’s no guarantee over what characters the browser’s innerHTML property will output.

As you seem to be using jQuery already, grab the content with text() instead.

var isDomain = new RegExp(document.domain, 'g');
if (isDomain.test(linkUrl)) { ...

That’ll fail every second time, because global regexps remember their previous state (lastIndex): when used with methods like test, you’re supposed to keep calling repeatedly until they return no match.

You don’t seem to need g (multiple matches) here… but then you don’t seem to need regexp here either as a simple String indexOf would be more reliable. (In a regexp, each . in the domain would match any character in the link.)

Better still, use the URL decomposition properties on Location to do a direct comparison of hostnames, rather than crude string-matching over the whole URL:

if (location.hostname===links[i].hostname) { ...

// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
    break;

If you want to match words on word boundaries, and case insensitively, I think you’d be better off using a regex rather than plain substring matching. That’d also save doing four calls to findText for each keyword as it is at the moment. You can grab the inner bit (in if (child.nodeType==3) { ...) of the function in this answer and use that instead of the current string matching.

The annoying thing about making regexps from string is adding a load of backslashes to the punctuation, so you’ll want a function for that:

// Backslash-escape string for literal use in a RegExp
//
function RegExp_escape(s) {
    return s.replace(/([/\\^$*+?.()|[\]{}])/g, '\\$1')
};

var keywordre= new RegExp('\\b'+RegExp_escape(keyword)+'\\b', 'gi');

You could even do all the keyword replacements in one go for efficiency:

var keywords= [];
var hrefs= [];
for (var i=0; i<links.length; i++) {
    ...
    var text= $(links[i]).text();
    keywords.push('(\\b'+RegExp_escape(text)+'\\b)');
    hrefs.push[text]= links[i].href;
}
var keywordre= new RegExp(keywords.join('|'), 'gi');

and then for each match in linkup, check which match group has non-zero length and link with the hrefs[ of the same number.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Why would the below eliminate the whitespace around matched keyword text when replacing it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply