To select the text here:
Alpha Bravo Charlie Delta Echo Foxtrot
from this HTML structure:
<div id="entry-2" class="item-asset asset hentry">
<div class="asset-header">
<h2 class="asset-name entry-title">
<a rel="bookmark" href="http://blahblah.com/politics-democrat">Pelosi Q&A</a>
</h2>
</div>
<div class="asset-content entry-content">
<div class="asset-body">
<p>Alpha Bravo Charlie Delta Echo Foxtrot</p>
</div>
</div>
</div>
I apply following XPath expression to select the text inside asset-body:
//div[contains(
div/h2[
contains(concat(' ',@class,' '),' asset-name ')
and
contains(concat(' ',@class,' '),' entry-title ')
]/a[@rel='bookmark']/@href
,'democrat')
]/div/div[
contains(concat(' ',@class,' '),' asset-body ')
]//text()
How would I sanitize the following words from the text:
Alpha
Charlie
Echo
So that I end up with only the following text in this example:
Bravo Delta
How would I sanitize the following words from the text:
So that I end up with only the following text in this example:
This can’t be done in XPath 1.0 alone — you’ll need to get the text in the host language and do the replacement there.
In XPath 2.0 one can use the
replace()function: