I’ve been stuck on finding a reg expression to split the html element into 2 section. First would be the price and second number of downloads. Here is my HTML and here is the reg expression i tried using. I’m using a scraper program so I cant use java-script or jQuery.
HTML:
<h2>$850 / 3Downlaods - Software Name</h2>
Re Expression used Marker before:
/$\/\s*/
Re Expression used Marker After:
/\/\
this should return 850 only. No dollar sign. Im stuck on how to start and end the number of downloads. I need another set of Before and After regex’s to pull the number of download. Also exclude the word "downloads".
The program Im using is OutWit Hub Scraper Link to docs
If there will be no other nested tags inside the
<h2>(which are more complicated to account for) two()capture groups separated by/should do it:This breaks down as
<h2>, optional whitespace\s*,$, some number of digits(\d+)to capture, more optional whitespace on either side of/, a group of digits to capture, more optional whitespace beforeDownloads, any characters (non-greedy) up to the closing</h2>.If the price part may also include
,.the(\d+)group can be replaced by([0-9.,]+)(or more be even more specific to make sure it doesn’t start with,if necessary, for example)The usual warnings about using regular expressions to parse HTML apply here. This will only work successfully if your HTML input is rather predictable, with no nesting of tags inside the
<h2>.