I’m using php and cURL to get the content of various websites.
This is a google map info window: https://google-developers.appspot.com/maps/documentation/javascript/examples/infowindow-simple
Now, I want to get the content that is in the info window. Is there a way to do it?
In this particular case, the data within the infowindow is embedded in a script tag of the html itself, so downloading the html from the URL, and then creating a regular expression to extract the content of the infowindow (in this case the variable named contentString) is pretty easy to do.
Some websites will not be straightforward though, and a variety of approaches would need to be employed in order to collect the information. Dynamic websites may populate the contents of the info window by using an ajax call, or the content might be contained in a separate script or json file. If you are determined to scrape the content of each of these sites, you will likely have to do some custom coding for each individual site.