I’m a Scrapy & Xpath beginner and I’m looking to parse a website with the following structure
<dl class="ismSummary ismHomeSummary">
<dt>cat1</dt>
<dd>value1</dd>
<dd>value2</dd>
<dt>cat2</dt>
<dd>value1</dd>
<dd>value2</dd>
</dl>
With Xpath I only want to get value1 & value2 ( the dd‘s ) of cat1
This is what I have right now
//dt[text()="cat1"]/following-sibling::dd
The problem is it doesn’t stop at cat2 and continue selecting value1 & value2 from cat2. 🙁
All nodes here are children of
dl, so naturally all are siblings of the firstdt, so when you usefollowing-siblingyou get them all.Xpath was made with xml in mind, and in xml you probably would have the
ddelements as children ofdt, but unfortunately that’s not the case here.The easiest way woule be to just include all siblings of
dt(not just thedds) and iterate through the result set until adtcomes up. Using Xpath function to do do the same coule be possible, but is certainly more complicated.