I am fairly new to Regular Expressions and practicing a little with Notepad++. I am trying to extract some stock related data from Yahoo but somewhat lack the experience. Maybe somebody could give me a hand. It would be highly appreciated.
An example of what I try to parse is:
<strong>230.00</strong></a></td><td class="yfnc_tabledata1"><a href="http://ca.finance.yahoo.com/q?s=AMZN121026C00230000">AMZN121026C00230000</a></td><td class="yfnc_tabledata1" align="right"><b>9.35</b></td><td class="yfnc_tabledata1" align="right"><span id="yfs_c10_amzn121026c00230000"><img style="margin-right:-2px;" src="op_files/up_g.gif" alt="Up" border="0" height="14" width="10"> <span class="yfi-price-change-green">0.35</span></span></td><td class="yfnc_tabledata1" align="right">9.25</td><td class="yfnc_tabledata1" align="right">9.40</td><td class="yfnc_tabledata1" align="right">3,857</td><td class="yfnc_tabledata1" align="right">1,041</td></tr><tr><td class="yfnc_tabledata1" nowrap="nowrap">
I basically try to extract the numbers 230.00, 9.35, 0.35, 9.25, 9.40, 3,857, 1,041. What
What I managed so far is:
<strong>(\d.*?)</strong>.*?<b>(.*?)<
But it is really slow. Is that correct so far?
a possible faster variant could be
(?<=>)(\d{1,3}(?:,\d{3})*+(?:\.\d+)?)(?=<)it only matches only the numbers between > and < an ignores the rest…
but keep in mind, like SomeKittens said: “Generally, parsing HTML with regex is a bad idea….”