i am using the following code as input to dom document
<li id="SalesRank">
<b>Amazon Best Sellers Rank:</b>
#20,267 Paid in Kindle Store (
<a href="http://www.amazon.com/gp/bestsellers/digital-text/ref=pd_dp_ts_kstore_1/190-9295683-0277616">See Top 100 Paid in Kindle Store</a>
)
<ul class="zg_hrsr">
<li class="zg_hrsr_item">
<span class="zg_hrsr_rank">#15</span>
<span class="zg_hrsr_ladder">
in
<a href="http://www.amazon.com/gp/bestsellers/digital-text/ref=pd_zg_hrsr_kstore_1_1">Kindle Store</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/154606011">Kindle eBooks</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/157325011">Nonfiction</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/292975011">Lifestyle & Home</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/156699011">Home & Garden</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/156828011">Gardening & Horticulture</a>
>
<b>
<a href="http://rads.stackoverflow.com/amzn/click/156847011">Greenhouses</a>
</b>
</span>
</li>
<li class="zg_hrsr_item">
<span class="zg_hrsr_rank">#26</span>
<span class="zg_hrsr_ladder">
in
<a href="http://www.amazon.com/gp/bestsellers/digital-text/ref=pd_zg_hrsr_kstore_2_1">Kindle Store</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/154606011">Kindle eBooks</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/157325011">Nonfiction</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/292975011">Lifestyle & Home</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/156699011">Home & Garden</a>
>
<a href="http://rads.stackoverflow.com/amzn/click/156828011">Gardening & Horticulture</a>
>
<b>
<a href="http://rads.stackoverflow.com/amzn/click/156849011">House Plants</a>
</b>
</span>
</li>
</ul></li>
i am using the following xpath query to extract data with textContent..
$xpath_cat->query('//li[@id="SalesRank"]');
you can check the output, it includes the data which is including in all the li tags with id=salrsrank... while i want to get only the #20,267 paid in kindle store..
so the output required is
#20,267 Paid in Kindle Store
how can i modify my xpath to get the required output?
Update in code
i tried the solution provided below and used the xpath
$xpath_cat->query('//li[@id="SalesRank"]/text()');
but now, the output is
( [0] => [1] => #20,267 Paid in Kindle Store ( [2] => )
how can i fix this?
Does
//li[@id='SalesRank']/text()work for you?Update 1
If the text you want will always be in that location, then
will return
This uses
normailize-spaceto strip out extraneous whitepspace, andsubstring-beforeto select all text before the first occurence of ” (“.This problem will be much easier if you can get the target text in its own node, like:
<span/>has no effect on rendering and allows you to specifically select the text you want.If either the second solution doesn’t work in all cases, and you cannot get the target text in its own now, you will have to rely on some post-processing in the host language (PHP I presume).
Hope this helps,