I’m trying to pull in an element from an external website using PHP and cURL.
-
The link to the website I’m trying to pull content from is:
http://www.stayclassy.org/fundraise?fcid=231864 -
The element I’m targeting is the number value under the list item
“Raised So Far” in the right column at the top (right now the value is at $10). -
Here is the code I’m using to extract the data:
define("TARGET", "http://www.stayclassy.org/fundraise?fcid=231864");$curl = curl_init(); curl_setopt($curl, CURLOPT_URL, TARGET); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); if(!($results = curl_exec($curl))) { print("{ \"total\": \"$0.00\" }"); return; } $pattern = '/\<li class="goalTitle"\> \$(.+?) \<\/li\>\<\/a\>/'; preg_match_all($pattern, $results, $matches); $total = $matches[1][0]; $total = str_replace(",", "", $total); printf("{ \"total\": \"$%s\" }", formatMoney($total, true)); function formatMoney($number, $fractional=false) { if ($fractional) { $number = sprintf('%.2f', $number); } while (true) { $replaced = preg_replace('/(-?\d+)(\d\d\d)/', '$1,$2', $number); if ($replaced != $number) { $number = $replaced; } else { break; } } return $number; }
The issue I’m having is that the list item/element I’m targeting doesn’t have a unique ID or class. In fact, the dollar amount is located in a separate list item without a class.
I was wondering how to target a specific list item in an unordered list using the code above, particularly when it doesn’t have a class. Any ideas?
Targeting the specific item requires that you identify a unique string around it. To do this you just expand further and further out until you find a string you can identify that only occurs once. So, the line you want is:
but this is not unique at all. So we expand the string by adding the previous line as well:
and bingo, this string is unique for your needs. The string is fairly constant except for your amount, so it will be easy to use. So you need a regular expression that finds this string. I’d use something like this:
You don’t need to use
preg_match_allbecause you only expect to get one match:Your other options include loading the page with a
DOMDocument, and then usingXPathorgetElementByIdto parse the DOM. But that may be a little too much effort for this task.Also, I’d use
file_get_contentsto fetch the contents of the remote site. But that’s just me.UPDATE: To handle thousands separators as well, modify your pattern as follows: