This should be easy for a Javascript expert.
For those who don’t know what schema is ( http://schema.org ), it’s a new way for Search Engines to read content on a webpage. It works by tagging relevant data with specific tags.
For those who do know what it is, here is a chrome extension (Schema Explorer) that makes it easy to inspect what your data looks like on your page. See the example.
NOW: There is a tiny issue with the extension where by is does not skip/ignore empty nested elements. Here are two examples: The first works perfectly but, the second bombs because of the empty <div> tag:
First example works:
<div itemscope="" itemtype="http://schema.org/Movie">
<h1 itemprop="name">Avatar</h1>
<div itemprop="director" itemscope="" itemtype="http://schema.org/Person">
Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">August 16, 1954</span>)
</div>
<span itemprop="genre">Science fiction</span>
<a href="http://pierreloicdoulcet.fr/movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>
Seconds example gives issues:
<div itemscope="" itemtype="http://schema.org/Movie">
<div>
<h1 itemprop="name">Avatar</h1>
<div itemprop="director" itemscope="" itemtype="http://schema.org/Person">
Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">August 16, 1954</span>)
</div>
<span itemprop="genre">Science fiction</span>
<a href="http://pierreloicdoulcet.fr/movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>
</div>
I had a look at the extension and it’s actually very well put together with one javascript file doing most of the work. Here is the code that does the looping, however it needs to be able to skip empty nested elements and perhaps be a little bit more robust in general:
var __explore = function(node, parentData)
{
if (parentData === null || parentData === undefined)
{
parentData = __dataTree;
}
if (node.getAttribute)
{
var isItemScope = node.getAttribute('itemscope');
var hasItemProp = node.getAttribute('itemprop');
var itemtype = node.getAttribute('itemtype');
var childs = node.childNodes;
var i = 0;
var tmp = new Array();
while (i < childs.length)
{
if (isItemScope !== null)
__explore(childs[i], tmp);
else
__explore(childs[i], null);
++i;
}
if (isItemScope !== null)
{
parentData.push({name : 'scope', value : hasItemProp, type : itemtype, childs : [tmp], node : node});
}
else if (hasItemProp && parentData)
{
parentData.push({name : hasItemProp, value : node.innerText});
}
}
}
Here is the complete versions of the contentscript.js https://gist.github.com/3413475
Hopefully someone can help me with this. For the record I’ve contacted the author but he’s been preoccupied with more urgent matters.
I made it work as expected: http://jsfiddle.net/vyrvp/1/ but I must confess that this is a bit hackish. This code may need some more refactoring to make it work for all cases and make it readable.