Given an XML fragment that I want to parse with XPath I first need to extract the namespaces to add to the namespace manager. I’ve been trying to figure out the Regex pattern needed to extract xml attributes that define a namepspace. For example I want to get all the namespaces which I can do some more basic string manipulation on to separate out the namespace name and the url.
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-02-12T12:41:45"
The attribute name will always begin with xmlns: and I need the regex to read to the end of the value, so include the last “
Alternatively a more generic pattern would do the job to just extract ALL attributes in the form name=”value” and I can just do some string compares to see if each attribute is a namespace.
<my:StationLookupValues xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-02-12T12:41:45"><my:StationLookupValue>Hull Inspectors</my:StationLookupValue></my:StationLookupValues><my:StationLookupValues xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-02-12T12:41:45"><my:StationLookupValue>Barnsley Inspectors</my:StationLookupValue></my:StationLookupValues><my:StationValue xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-02-12T12:41:45">Hull Inspectors</my:StationValue>
I’ve not been able to find an example of something like this, nor work it out for myself. Any assistance on this would be very much appriciated.
[EDIT]
I understand that XML parsers should be used and this is what I am going to do. But all I have is an XML fragment to pass so I must first build a namespace manager and in order to do that I need to extract the namespaces used.
Try this pattern: ‘xmlns:(.*?)=(“.*?”)’
It means
The parenthesis means the first group contains the namespace name, the second group is the value. Adjust according to whether you want it all in one, and whether you want or don’t want the quotes in the group.
As Tomalak pointed out in his answer, this is fraught with peril. It could potentially match patterns that are parts of comments or embedded in strings as data, etc. This is why regular expressions aren’t good for parsing xml data — since you aren’t actually parsing, you’re just looking for patterns without regard to context).