How do I use the and operator ‘|’ to compute 2 node sets? In other words, compute data of 2 siblings with different tag names. In this case : I want data from both edition and edition2 tags. I tried ‘| and it didn’t work.
XML :
<?xml version="1.0" encoding="utf-8"?>
<wikimedia>
<projects>
<project name="Wikipedia" launch="2001-01-05">
<editions>
<edition language="English">en.wikipedia.org</edition>
<edition language="German">de.wikipedia.org</edition>
<edition language="French">fr.wikipedia.org</edition>
<edition language="Polish">pl.wikipedia.org</edition>
<edition language="Spanish">es.wikipedia.org</edition>
<edition2 language="Spanglish">egs.wikipedia.org</edition2>
<img src="hello.gif">hello</img>
</editions>
</project>
<project name="Wiktionary" launch="2002-12-12">
<editions>
<edition language="English">en.wiktionary.org</edition>
<edition language="French">fr.wiktionary.org</edition>
<edition language="Vietnamese">vi.wiktionary.org</edition>
<edition language="Turkish">tr.wiktionary.org</edition>
<edition language="Spanish">es.wiktionary.org</edition>
<edition2 language="Spanglish">egs.wiktionary.org</edition2>
<img src="hello.gif">hello</img>
</editions>
</project>
</projects>
</wikimedia>
Python :
>>> wikixml.xpath('//edition/text() | edition2/text()')
['en.wikipedia.org', 'de.wikipedia.org', 'fr.wikipedia.org', 'pl.wikipedia.org', 'es.wikipedia.org', 'en.wiktionary.org', 'fr.wiktionary.org', 'vi.wiktionary.org', 'tr.wiktionary.org', 'es.wiktionary.org']
EDIT
I got it working after the answer but I also want to select the value of img/@src
I managed to do this using the union operator |
>>> wikixml.xpath('//edition/text() | //edition2/text() | //img/@src')
['en.wikipedia.org', 'de.wikipedia.org', 'fr.wikipedia.org', 'pl.wikipedia.org', 'es.wikipedia.org', 'egs.wikipedia.org', 'hello.gif', 'en.wiktionary.org', 'fr.wiktionary.org', 'vi.wiktionary.org', 'tr.wiktionary.org', 'es.wiktionary.org', 'egs.wiktionary.org', 'hello.gif']
How to do this using a single predicate and the self notation like it’s done here with two elements
/wikimedia/projects/project/editions/ *[self::edition or self::edition2]/text()
now that we have @src and text()?
Use
or more efficiently
or even better
As for the question update
Honestly, I don’t know if there’s a way to select from both the element and attribute axes at the same time using the above notation. I doubt it.
You can either do it with a single XPath, but giving up the self::notation (at least for the attribute):
or select
edition,edition2and all elements containing ansrcattributeand then process the result to fetch the value of
srcThat’s about as much as you can do in XPath 1.0
As per Dimitre Novatchev’s suggestion, XPath 2.0 allows you to write it this way: