Is it impossible to run a web crawler on GAE along side with my app considering the I am running the free startup version?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
While Google hadn’t exposed scheduling, queue and background tasks API, you can do any processing only as an answer to external HTTP request. You’d need some heartbeat service that will process one item from crawler’s queue at a time (not to hit GAE limits).
To do crawling from GAE, you have to split your application into queue (that stores queue data in Datastore), queue processor that will react to external HTTP heartbeat and your actual crawling logic.
You’d manually have to watch your quota usage and start heartbeat when you have spare quota, and stop if it is used up.
When Google introduces the APIs I’ve told in the beginning you’d have to rewrite parts that are implemented more effectively via Google API.
UPDATE: Google introduced Task Queue API some time ago. See task queue docs for python and java.