I’m getting some weird differences when running Nokogiri locally versus running it on my server. On my local machine the entire document seems to parse and be available but on the server I seem to get the doctype tab and some random comment tags.
To start off, to make sure it wasn’t a problem with open-uri I checked it – the results are not exact but do contain the correct markup.
Local:
ruby-1.8.7-p352 :005 > s = open('http://www.pennstateind.com/store/PK2WAY.html')
=> #<File:/var/folders/G8/G8bsAGBk1o82Eyks3ZmFtq-+3Y6/-Tmp-/open-uri20120626-5891-10y2ncr-0>
ruby-1.8.7-p352 :006 > s.length
=> 88408
Server:
rb(main):008:0> s = open('http://www.pennstateind.com/store/PK2WAY.html')
=> #<File:/tmp/open-uri20120626-22167-1td2l72-0>
irb(main):009:0> s.length
=> 98184
When I run this on my local machine I get this:
ruby-1.8.7-p352 :003 > d = Nokogiri::HTML(open('http://www.pennstateind.com/store/PK2WAY.html'))
=> [ OUTPUT OMITTED FOR BREVITY - CAN SUPPLY ON REQUEST ]
ruby-1.8.7-p352 :004 > d.to_s.length
=> 85212
But when I run this on the server I get this:
rb(main):006:0> d = Nokogiri::HTML(open('http://www.pennstateind.com/store/PK2WAY.html'))
=> #<Nokogiri::HTML::Document:0x36620e14b580 name="document" children= [#<Nokogiri::XML::DTD:0x36620e14b1c0 name="html">, #<Nokogiri::XML::Comment:0x36620e14b170 " Open Graph Tags ">, #<Nokogiri::XML::Comment:0x36620e14a98c " Customer_Session_Verified: 0 ">]>
irb(main):007:0> d.to_s.length
=> 172
The only apparent gem difference is for the JS compiler – all other gems are the exact version between local and server:
Local => libv8 (3.3.10.4 x86-darwin-10)
Server => libv8 (3.3.10.4 x86_64-linux)
Any ideas how to figure out what is going on and/or fix this?
Update – to isolate where the problem actually was I pulled a file from the server and from localhost then ran them on each. The results below show that the problem definitely lies in Nokogiri – what the problem is I am still perplexed by…
Running locally:
# FILE ORIGINALLY PULLED FROM SERVER
ruby-1.8.7-p352 :015 > server_file = File.open("/Users/jmcdonald/Desktop/files/SERVER.txt", "r")
=> #<File:/Users/jmcdonald/Desktop/files/SERVER.txt>
ruby-1.8.7-p352 :016 > server_file.read.length
=> 93071
ruby-1.8.7-p352 :022 > Nokogiri::HTML(server_file).to_s.length
=> 98793
# FILE ORIGINALLY PULLED FROM LOCALHOST
=> #<File:/Users/jmcdonald/Desktop/files/LOCAL.txt>
ruby-1.8.7-p352 :018 > local_file.read.length
=> 89622
ruby-1.8.7-p352 :026 > Nokogiri::HTML(local_file).to_html.length
=> 94632
Running on server:
# FILE ORIGINALLY PULLED FROM SERVER
irb(main):001:0> sf = File.open('/home/charlest/public_html/files/nokogiri_issue/SERVER.txt', 'r')
=> #<File:/home/charlest/public_html/files/nokogiri_issue/SERVER.txt>
irb(main):002:0> sf.read.length
=> 93071
irb(main):004:0> Nokogiri::HTML(sf).to_s.length
=> 896 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WRONG
# FILE ORIGINALLY PULLED FROM LOCALHOST
irb(main):008:0> lf = File.open('/home/charlest/public_html/files/nokogiri_issue/LOCAL.txt', 'r')
=> #<File:/home/charlest/public_html/files/nokogiri_issue/LOCAL.txt>
irb(main):009:0> lf.read.length
=> 89622
irb(main):011:0> Nokogiri::HTML(lf).to_s.length
=> 896 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WRONG
It looks like your server and local environment are using different versions of libxml2. Older versions are known to have strange parsing bugs, so updating your server to the latest version you possibly can (or at least to the same version you’re using for development) should fix you up.