I’m building an HTTP API which does a heavy site scraping in the background.
The “site scraping” part is trivial – I’m using HttpUnit’s WebConversation object which represents a browser.
But I need to persist this WebConversation object between API requests.
Unfortunately, WebConversation is not Serializable. It is also quite large.
So, how to reliably keep a large, non-serializable objects like this between requests?
Can I simply create a static list somewhere and manage it myself? This object must be also accessible from the Play! background jobs.
PS High availability is not a concern here – I can stick sessions to the server.
If High Availability, and scaling are not requirements, then there is no reason why you couldn’t keep a singleton class, that contains a map of WebConversation objects against a an id that you can store in the session cookie.
The major downside of this is that it breaks the stateless nature of Play, meaning that if you ever did need to scale your application above a single server, then you will have to radically change the design of your application.
You could take a look at this application written in play2 by one of the developers at Zenexity. It is a screenscraping web service using some cool technologies that may be more appropriate for your application.