How I love regex!
I have a string which will be a mangled form of XML, like:
<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>
Everything will all be on one line, however the ‘headers’ will often be different.
So what I need to do is extract all information from the string above, putting it into a Dictionary/Hashtable
—
string myString = @"<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>";
//this will extract the name of the label in the header
Regex r = new Regex(@"(?<header><[A-Za-z]+>?)");
//Create a collection of matches
MatchCollection mc = r.Matches(myString);
foreach (Match m in mc)
{
headers.Add(m.Groups["header"].Value);
}
//this will try and get the values.
r = new Regex(@"(?'val'>[A-Za-z0-9\s]*</?)");
mc = r.Matches(myString);
foreach (Match m in mc)
{
string match = m.Groups["val"].Value;
if (string.IsNullOrEmpty(match) || match == "><" || match == "> <")
continue;
else
values.Add(match);
}
—
I hacked that together from previous work with regexes to the closest I could.
But it doesnt really work the way I want it.
the ‘header’ also pulls the angle brackets in.
The ‘value’ pulls in a lot of empties (hence the dodgy if statement in the loop). It also doesnt work on strings with periods, commas, spaces, etc.
It would also be much better if I could combine the two statements so I dont have to loop through the regex twice.
Can anyone give me some info where I can improve it?
If it looks like XML, why not use the XML parser functionalities of .net? All you need to do is to add a root element around it: