Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I am trying to search the following HTML string to get the cost of these products:
<div id=menu>
<p>A hamburger without cheese costs $5.</p>
<p>A cheeseburger with one patty costs $6.</p>
</div>
I was able to successfully get the price of each item using the following expressions:
string hamburger = "<p>A hamburger[^\\$]+\\$(?<price>.*?).</p>";
string cheeseburger = "<p>A cheeseburger[^\\$]+\\$(?<price>.*?).</p>"
public string GetProductPrice(string expression)
{
expression = Regex.Unescape(expression);
Regex regex = new Regex(expression);
MatchCollection mc = regex.Matches(MENU_DIV_STRING);
if (mc.Count > 0 && mc[0].Groups.Count == 2)
return mc[0].Groups[1].ToString();
else
return "--";
}
However, I was thrown a loop when given this:
<div id=menu>
<p>A hamburger without cheese costs $5.</p>
<p>A cheeseburger with one patty costs $6.</p>
<p>A cheeseburger (SPECIAL: add an additional patty for $1 each) costs $6.</p>
</div>
The appearance of a second dollar sign in “add a second patty for $1” threw me for a total loop. I’ve researched and tried a number of things like using patterns and at this point I’ve totally confused myself.
Is there a regular expression that will find out how much a cheeseburger costs whether there is a special or not?
NO..NO..NO..
Regex is not a good choice for parsing HTML files..
HTML is not strict nor is it regular with its format..
Use htmlagilitypack
Regex is used for Regular expression NOT Irregular expression
You can use this code to retrieve it like this
The regex would be
(?<name>[Aa]?\s*.*?)\s.*?(?<price>\$\d+).*Group1 captures the name
Group2 captures the price