I’m trying to persist roughly 28,000 “rows” in a single entity e.g. EMPLOYEE
Basically, my goal is to avoid being terminated / timing out by using PUTs that exceed 30 seconds – which is what might happen if I just do 28,000 PUTs by invoking a doPost() request sent to a servlet.
So I’m thinking of using tasks described in the Google App Engine documentation.
Essentially, I would like to upload a csv file in the war directory with 28,000 “employees”. Then create a task that will async PUT these 28,000 employee rows to the EMPLOYEE entity.
-
Q1: Is this a viable solution or is there a better way? Again, the goal is to performthe PUTs to avoid being terminated due to the 30 second limit.
-
Q2: Also what queue.xml configurations should I use to ensure I can perform these PUTs as fast as possible?
-
Q3: Now, I’ve tried it, similar to blog entry: http://gaejexperiments.wordpress.com/2009/11/24/episode-10-using-the-task-queue-service/ but I’m getting the following error after 23 or so seconds:
SEVERE: Job default.task1 threw an unhandled Exception: com.google.apphosting.api.ApiProxy$ApplicationException: ApplicationError: 5: http method POST against URL http://127.0.0.1:8888/dotaskservlet timed out. at com.google.appengine.api.urlfetch.dev.LocalURLFetchService.fetch(LocalURLFetchService.java:236) at com.google.appengine.api.taskqueue.dev.LocalTaskQueue$UrlFetchServiceLocalTaskQueueCallback.execute(LocalTaskQueue.java:471) at com.google.appengine.api.taskqueue.dev.UrlFetchJob.execute(UrlFetchJob.java:77) at org.quartz.core.JobRunShell.run(JobRunShell.java:203) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520) 16/02/2011 12:12:55 PM org.quartz.core.ErrorLogger schedulerError SEVERE: Job (default.task1 threw an exception. org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: com.google.apphosting.api.ApiProxy$ApplicationException: ApplicationError: 5: http method POST against URL http://127.0.0.1:8888/dotaskservlet timed out.] at org.quartz.core.JobRunShell.run(JobRunShell.java:214) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520) * Nested Exception (Underlying Cause) --------------- com.google.apphosting.api.ApiProxy$ApplicationException: ApplicationError: 5: http method POST against URL http://127.0.0.1:8888/dotaskservlet timed out. at com.google.appengine.api.urlfetch.dev.LocalURLFetchService.fetch(LocalURLFetchService.java:236) at com.google.appengine.api.taskqueue.dev.LocalTaskQueue$UrlFetchServiceLocalTaskQueueCallback.execute(LocalTaskQueue.java:471) at com.google.appengine.api.taskqueue.dev.UrlFetchJob.execute(UrlFetchJob.java:77) at org.quartz.core.JobRunShell.run(JobRunShell.java:203) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520) -
Q4: I’ve also checked the Datastore Viewer at http://localhost:8888/_ah/admin and it seems to have only created 1000 results in that entity. Is 1000 the limit?
-
Q5: How do I get rid of that above error?
-
Q6: Can any confirm that the maximum allowed time is 10minutes for a task? or is it still 30seconds? I did come accross this: http://code.google.com/appengine/docs/java/taskqueue/overview.html#Task_Execution
If your goal is only to upload a bunch of data yourself, and not to allow your users to do so, I think an easier tool would be the bulk uploader. You can just run a python program from your local machine that takes care of request limits and failure recovery for you.
http://ikaisays.com/2010/06/10/using-the-bulkloader-with-java-app-engine/