Why would the below eliminate the whitespace around matched keyword text when replacing it with an anchor link? Note, this error only occurs in Chrome, and not firefox.
For complete context, the file is located at: http://seox.org/lbp/lb-core.js
To view the code in action (no errors found yet), the demo page is at http://seox.org/test.html. Copy/Pasting the first paragraph into a rich text editor (ie: dreamweaver, or gmail with rich text editor turned on) will reveal the problem, with words bunched together. Pasting it into a plain text editor will not.
// Find page text (not in links) -> doxdesk.com
function findPlainTextExceptInLinks(element, substring, callback) {
for (var childi= element.childNodes.length; childi-->0;) {
var child= element.childNodes[childi];
if (child.nodeType===1) {
if (child.tagName.toLowerCase()!=='a')
findPlainTextExceptInLinks(child, substring, callback);
} else if (child.nodeType===3) {
var index= child.data.length;
while (true) {
index= child.data.lastIndexOf(substring, index);
if (index===-1 || limit.indexOf(substring.toLowerCase()) !== -1)
break;
// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
break;
// alert(child.nodeValue.charAt(index+keyword.length + 1));
callback.call(window, child, index)
}
}
}
}
// Linkup function, call with various type cases (below)
function linkup(node, index) {
node.splitText(index+keyword.length);
var a= document.createElement('a');
a.href= linkUrl;
a.appendChild(node.splitText(index));
node.parentNode.insertBefore(a, node.nextSibling);
limit.push(keyword.toLowerCase()); // Add the keyword to memory
urlMemory.push(linkUrl); // Add the url to memory
}
// lower case (already applied)
findPlainTextExceptInLinks(lbp.vrs.holder, keyword, linkup);
Thanks in advance for your help. I’m nearly ready to launch the script, and will gladly comment in kudos to you for your assistance.
It’s not anything to do with the linking functionality; it happens to copied links that are already on the page too, and the
creditcontent, even if theprocessSel()call is commented out.It seems to be a weird bug in Chrome’s rich text copy function. The content in the
holderis fine; if you cloneContents the selected range and alert its innerHTML at the end, the whitespaces are clearly there. But whitespaces just before, just after, and at the inner edges of any inline element (not just links!) don’t show up in rich text.Even if you add new text nodes to the DOM containing spaces next to a link, Chrome swallows them. I was able to make it look right by inserting non-breaking spaces:
but that’s pretty ugly, should be unnecessary, and doesn’t fix up other inline elements. Bad Chrome!
It’s unwise to rely on
innerHTMLto get text from an element, as the browser may escape or not-escape characters in it. Most notably&, but there’s no guarantee over what characters the browser’sinnerHTMLproperty will output.As you seem to be using jQuery already, grab the content with
text()instead.That’ll fail every second time, because
global regexps remember their previous state (lastIndex): when used with methods liketest, you’re supposed to keep calling repeatedly until they return no match.You don’t seem to need
g(multiple matches) here… but then you don’t seem to need regexp here either as a simple StringindexOfwould be more reliable. (In a regexp, each.in the domain would match any character in the link.)Better still, use the URL decomposition properties on
Locationto do a direct comparison of hostnames, rather than crude string-matching over the whole URL:
If you want to match words on word boundaries, and case insensitively, I think you’d be better off using a regex rather than plain substring matching. That’d also save doing four calls to
findTextfor each keyword as it is at the moment. You can grab the inner bit (inif (child.nodeType==3) { ...) of the function in this answer and use that instead of the current string matching.The annoying thing about making regexps from string is adding a load of backslashes to the punctuation, so you’ll want a function for that:
You could even do all the keyword replacements in one go for efficiency:
and then for each match in
linkup, check which match group has non-zero length and link with thehrefs[of the same number.