I’m using open-uri and nokogiri with ruby to do some simple webcrawling.
There’s one problem that sometimes html is read before it is fully loaded. In such cases, I cannot fetch any content other than the loading-icon and the nav bar.
What is the best way to tell open-uri or nokogiri to wait until the page is fully loaded?
Currently my script looks like:
require 'nokogiri'
require 'open-uri'
url = "https://www.the-page-i-wanna-crawl.com"
doc = Nokogiri::HTML(open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE))
puts doc.at_css("h2").text
What you describe is not possible. The result of
openwill only be passed toHTMLafter theopenmethod as returned the full value.I suspect that the page itself uses AJAX to load its content, as has been suggested in the comments, in this case you may use Watir to fetch the page using a browser
This might open a browser window though.