I need to use .NET regular expressions to scrap some values between <value> tags of a markup file such as this (copy\pasted excerpt):
<Title>Section1</Title>
<attributeArray><name>Name1</name><value>Value1</value></attributeArray>
<attributeArray><name>Name2</name><value>Value2</value></attributeArray>
<attributeArray><name>Name3</name><value>Value3</value></attributeArray>
<attributeArray><name>Name4</name><value>Value4</value></attributeArray>
<Title>Section2</Title>
<attributeArray><name>Name1</name><value>Value1</value></attributeArray>
<attributeArray><name>Name2</name><value>Value2</value></attributeArray>
<attributeArray><name>Name3</name><value>Value3</value></attributeArray>
<attributeArray><name>Name4</name><value>Value4</value></attributeArray>
</node>
The actual text goes on to include 6 sections. the problem I have is that all tag names for each section are identical and I only need to extract the values from say Section2 (so not including 1, 3,4,5,6).
I have struggled with this for a couple days and tried various conditional expressions which was new to me like this:
(?(<node>Section2)(.*?<value>(?<Value>.*?)<\/value>.*?))
If Section 2, then parse the value keys, but it only extracts the first value – it does not iterate through each <value> of the markup. and the markup usually has around 10 values that I need to extract (abbreviated in the example above).
This is not being done in code so I don’t have the liberty of using an XML parser.
Any suggestions would be greatly appreciated – or if I can clarify further let me know.
an afterthought- if there is a way to include the text of the title with each value match then I could parse all 6 sections, but I could later filter the result based on the section I am after would also work.
example:
match1
group1 = Section2
group2 = Value1
match2
group1 = Section2
group2 = Value2
match3
group1 = Section2
group2 = Value3
match4
group1 = Section2
group2 = Value4
Thanks!
Here’s one option:
The first match includes the header, and the following matches must start where the previous one ended.
Working example: http://regexhero.net/tester/?id=321ce843-923d-4556-9b99-dbb72175929a
Note that the above will fail if you have other elements you didn’t mention between the values or the title. You can get around that with a probably less efficient pattern, using the fact .Net regexes can have variable length lookbehinds:
Example: http://regexhero.net/tester/?id=743c4de6-1b8a-48a4-a69b-63f3624de594
If you want to, you can change the title to
<Title>(?<title>[^<]*)</Title>, capture all values in the file, and filter by the wanted title – it will be added to each match.Lastly, here’s a similar approach which will work in other flavors: it captures key/value pairs before the title
Section3, assuming it is well ordered:Example: http://regexhero.net/tester/?id=8d8ae0e8-5f10-439f-a5a5-50d0b4e73bd2