I have a crawl set up:
require 'anemone'
Anemone.crawl("http://www.website.co.uk", :depth_limit => 1) do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
However I want the spider to use a Google-analytics anti-tracking tag on every URL it visits and not necessarily actually click the links.
I could use the spider once and store all of the URL’s and use WATIR to run through them adding the tag but I want to avoid this because it is slow and I like the skip_links_like and page depth functions.
How could I implement this?
You want to add something to the URL before you load it, correct? You can use
focus_crawlfor that.The
focus_crawlmethod intended to filter the URL list:but you can use it as a general purpose URL filter as well.
For example, if you wanted to add
atm_source=SiteCon&atm_medium=Mycampaignto all the links then yourpage.links.mapwould look something like this:If you’re
atm_sourceoratm_mediumcontain non-URL safe characters then URI-encode them.