I’m trying to use Mechanize login to Google Docs so that I can scrape

Question

0

Asked: May 21, 20262026-05-21T00:22:04+00:00 2026-05-21T00:22:04+00:00

I’m trying to use Mechanize login to Google Docs so that I can scrape

0

I’m trying to use Mechanize login to Google Docs so that I can scrape something (not possible from the API) but I keep seem to keep getting a 404 when trying to follow the meta redirect:

require 'rubygems'
require 'mechanize'

USERNAME = "..."
PASSWORD = "..."

LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"

agent = Mechanize.new
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = USERNAME
login_form.Passwd = PASSWORD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts "redirect: #{redirect}"

followed_page = agent.get(redirect) # throws a HTTPNotFound exception

pp followed_page

Can anyone see why this isn’t working?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T00:22:05+00:00

Andy you’re awesome!!
Your code helped me to make my script workable and to login into google account. I found your error after couple of hours.It was about html escaping. As I found,Mechanize automatically escapes uri it recieves as a parameter for ‘get’ method. So my solution is:

EMAIL  = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive   = true
agent.redirect_ok  = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"

login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page

This works just fine for me. I have replaced continue parameter from the meta tag (which is already escaped) by new one.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to use Mechanize login to Google Docs so that I can scrape

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply