I am trying to screen-scrape a html page so I can extract desired valuable data from it and into a text file. So far it’s going well until I came across this within the html page:
<td> <b>In inventory</b>: 0.3 kg<br /><b>Equipped</b>: -4.5 kg
The above line in the html code for the page often varies. So it need to figure about a way to scan the line (regardless of what it contains) for the weight (in this case would be 0.3 and -4.5) and store this data into 2 seperate doubles as of such:
double inventoryWeight = 0.3
double equippedWeight = -4.5
I would like this to be done using pure java; if need be, do not hesitate to notify me of any third-party programs which can be executed within my java application to achieve this (but please vividly explain if so).
Thank you a bunch!
RegEx is usually a good solution for scraping text. Parentheses denote “capturing groups”, which are stored and can then be accessed using Matcher.group(). [-.\d]+ matches anything consisting of one or more digits (0-9), periods, and hyphens. .* matches anything (but sometimes not newline characters). Here it’s just used to essentially “throw away” everything you don’t care about.