I have an asp.net vb project that needs to parse some raw XML that is coming out of a database the XML is laid out like this:
<HTML><HEAD><TITLE></TITLE></HEAD><BODY><STRONG><A name=SN>AARTS</A>, <A name=GN>Michelle Marie</A>, </STRONG><A name=HO>B.Sc.</A>, <A name=HO>M.Sc.</A>, <A name=HO>Ph.D.</A>; <A name=OC>scientist, professor</A>; b. <A name=BC>St. Marys</A>, Ont. <A name=BY>1970</A>; <A name=PA>d. Wm. and H. Aarts</A>; <A name=ED>e. Univ. of Western Ont. B.Sc.(Hons.) 1994, M.Sc. 1997</A>; <A name=ED>McGill Univ. Ph.D. 2002</A>; <A name=MA>m. L. MacManus</A>; two children; <A name=PO>CANADA RESEARCH CHAIR IN SIGNAL TRANSDUCTION IN ISCHEMIA</A> and <A name=PO>ASST. PROF., DEPT. OF BIOL. SCI., UNIV. OF TORONTO SCARBOROUGH 2006– </A>; Postdoctoral Fellow, Toronto Western Hosp. 2000–06; Expert Cons., Auris Med. SAS, Montpellier, France; mem., Centre for the Neurobiol. of Stress; named INMHA Brainstar of the Year 2003; Bd. of Dirs. & Fundraising Chair, N'Sheemaehn Childcare; mem., Soc. for Neurosci.; Cdn. Physiol. Soc.; Cdn. Assn. for Neurosci.; <A name=WK>co-author: 'Therapeutic Tools in Brain Damage' in <EM>Proteomics and Protein Interactions: Biology, Chemistry, Bioinformatics and Drug Design </EM>2005; 18 pub. journal articles</A>; Office: <A name=OF1_L1>1265 Military Trail</A>, <A name=OF1_CT>Scarborough</A>, <A name=OF1_PR>Ont.</A> <A name=OF1_PC>M1C 1A4</A>. </BODY></HTML>
And the code behind I’m using is this
Dim FullBio As New System.Xml.XmlDocument
Dim NodeList As System.Xml.XmlNodeList
Dim Node As System.Xml.XmlNode
FullBio.LoadXml(bio.Item(11))
NodeList = FullBio.SelectNodes("a")
For Each Node In NodeList
Dim name = Node.Attributes.GetNamedItem("name").Value()
lblEducation.Text = lblEducation.Text + name.ToString() + Node.InnerText + "<br />"
Next
So the XML loaded into the Xml Document at
FullBio.LoadXml(bio.Item(11))
is the XML I provided at the top. I am getting this error message:
'SN' is an unexpected token. The expected token is '"' or '''. Line 1, position 49.
I know that the error is because the attributes are not quoted. Is there anyway to make XmlDocument understand the attributes anyway or an easy way to use a reg expression to add quotes to the attributes before loading the string into the xmldoc?
What you have is invalid XML. An XmlDocument expects that the input is valid XML. I would recommend you using an HTML parser such as Html Agility Pack in order to parse HTML (which is what you have as input). So for example if you wanted to list all
nameattribute values for all anchors it’s as simple as that: