Is it possible to have an xml output from webpage using web::scraper in perl. as an example, My html looks like follows(I took some part of html from URL):
> <table class="reference">
> <tr>
> <th width="23%" align="left">Property</th>
> <th width="71%" align="left">Description</th>
> <th style="text-align:center;">DOM</th>
> </tr>
> <tr>
> <td><a href="prop_node_attributes.asp">attributes</a></td>
> <td>Returns a collection of a node's attributes</td>
> <td style="text-align:center;">1</td>
> </tr>
>
> <tr>
> <td><a href="prop_node_baseuri.asp">baseURI</a></td>
> <td>Returns the absolute base URI of a node</td>
> <td style="text-align:center;">3</td>
> </tr>
> <tr>
> <td><a href="prop_node_childnodes.asp">childNodes</a></td>
> <td>Returns a NodeList of child nodes for a node</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_firstchild.asp">firstChild</a></td>
> <td>Returns the first child of a node</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_lastchild.asp">lastChild</a></td>
> <td>Returns the last child of a node</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_localname.asp">localName</a></td>
> <td>Returns the local part of the name of a node</td>
> <td style="text-align:center;">2</td>
> </tr>
> <tr>
> <td><a href="prop_node_namespaceuri.asp">namespaceURI</a></td>
> <td>Returns the namespace URI of a node</td>
> <td style="text-align:center;">2</td>
> </tr>
> <tr>
> <td><a href="prop_node_nextsibling.asp">nextSibling</a></td>
> <td>Returns the next node at the same node tree level</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_nodename.asp">nodeName</a></td>
> <td>Returns the name of a node, depending on its type</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_nodetype.asp">nodeType</a></td>
> <td>Returns the type of a node</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_nodevalue.asp">nodeValue</a></td>
> <td>Sets or returns the value of a node, depending on its
> type</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_ownerdocument.asp">ownerDocument</a></td>
> <td>Returns the root element (document object) for a node</td>
> <td style="text-align:center;">2</td>
> </tr>
> <tr>
> <td><a href="prop_node_parentnode.asp">parentNode</a></td>
> <td>Returns the parent node of a node</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_prefix.asp">prefix</a></td>
> <td>Sets or returns the namespace prefix of a node</td>
> <td style="text-align:center;">2</td>
> </tr>
> <tr>
> <td><a href="prop_node_previoussibling.asp">previousSibling</a></td>
> <td>Returns the previous node at the same node tree level</td>
> <td style="text-align:center;">1</td>
> </tr>
> <tr>
> <td><a href="prop_node_textcontent.asp">textContent</a></td>
> <td>Sets or returns the textual content of a node and its
> descendants</td>
> <td style="text-align:center;">3</td>
> </tr>
> </table>
SO my perl code goes like:
#!/usr/bin/perl
use warnings;
use strict;
use URI;
use Web::Scraper;
# website to scrape
my $urlToScrape = "http://www.w3schools.com/jsref/dom_obj_node.asp";
my $rennersdata = scraper {
process "table.reference > tr > td > a", 'renners[]' => 'TEXT';
process "table.reference > tr > td:nth-of-type(2)", 'landrenner[]' => 'TEXT';
process "table.reference > tr > td:nth-of-type(3)", 'dom[]' => 'TEXT';
};
my $res = $teamsdata->scrape(URI->new($urlToScrape));
for my $i (0 .. $#{$res->{renners}}) {
print "<PropertyList>\n";
print "<Property>\n";
print "<Name> ";
print $res->{renners}[$i];
print "\n";
print "</Name>";
print "\n";
print "</Property>\n";
print "</PropertyList>\n";
}
for my $j (0 .. $#{$res->{landrenner}}) {
print "<ReturnValue>\n";
print $res->{landrenner}[$j];
print "\n";
print "</ReturnValue>\n";
}
for my $k (0 .. $#{$res->{dom}}) {
print "<domversion>\n";
print $res->{dom}[$k];
print "\n";
print "</domversion>\n";
}
when I am running the above code, I get all the output as follows:
<PropertyList>
<Property>
<Name>attributes</Name>
<Property>
<PropertyList>
<PropertyList>
<Property>
<Name>baseURI</Name>
<Property>
<PropertyList>
...
<ReturnValue>
Returns a collection of a node's attributes
</ReturnValue>
....
<domversion>
1
</domversion>
....
Is it possible that I can get the output as follows:
<PropertyList>
<Property>
<Name>attributes</Name>
<ReturnValue>Returns a collection of a node's attributes</ReturnValue>
<DOMVersion>1</DOMVersion>
</Property>
</PropertyList>
How can I combine the above three forloops in order to get output as above?
Many Thanks
You just need to move your output into the first
forloop. Since you’ve an equal number of items in each of the three keys in$res, you can just use$ito access all of the individual items. You’ll get the three values that belong to each other with your iteration from$i.I changed the
printstatements to use a HERE doc because it is more easily readable. I also changed the linemy $res = $teamsdata->scrape(URI->new($urlToScrape));tomy $res = $rennersdata->scrape(URI->new($urlToScrape));because$teamsdatawas not declared.