This is a general design question about how to make a web application that will receive a large amount of uploaded data, process it, and return a result, all without the dreaded spinning beach-ball for 5 minutes or a possible HTTP timeout.
Here’s the requirements:
- make a web form where you can upload a CSV file containing a list of URLs
- when the user clicks “submit”, the server fetches the file, and checks each URL to see if its alive, and what the title tag of the page is.
- the result is a downloadable CSV file containing the URL, and the result HTTP code
- the input CSV can be very large ( > 100000 rows), so the fetch process might take 5-30 minutes.
My solution so far is to have a spinning javascript loop on the client site, which queries the server every second to determine the overall progress of the job. This seems kludgy to me, and I’m hesitant to accept this as the best solution.
I’m using perl, template toolkit, and jquery, but any solution using any web technology would be acceptable.
edit:
An example of a possible solution is in this question: How do I implement basic "Long Polling"?
You can do this with AJAX but you may get better real-time results with a COMET like implementation. I believe that COMET implementations are specifically designed to get around some timeout limitations but I haven’t used any so I can’t offer a direct guide.
Either way my recommendation is to hand off the work to another process once it gets to the server.
I’ve worked a number of different solutions for batch tasks of this nature and the one I like the best is to hand off the batch work to another process. In such a system the upload page hands off the work to a separate processor and returns immediately with instructions for the user to monitor the process.
The batch processor can be implemented in a couple of ways:
You can then offer the user multiple ways to monitor the process:
The batch processor can communicate it’s status via a number of methods:
There are a number of benefits to handing the code off to another process: