Let’s start by example. There’s a website, say it’s a clone of Twitter named Tlitter. Tlitter, like twitter is constantly updated with new content (most of it is just a litter, hence the name). Unlike twitter, there’s no JSON/XML API to fetch content in a convenient way. In order to get data from it, you have to fetch good old HTML and parse it. It’s the only way of getting that content.
Tlitter admins are sometimes changing their mind. They may change website look and HTML code, in such a way that will render the code extracting the data non-working. You cannot predict when a change will be made. It can be made once a week, once a month or maybe… never.
You created an Android application that uses content from Tlitter to complement content from other source (say it’s twitter). Twitter is crucial and there are no problems with it, since it has a nice api, but Tlitter may give you some headache when it ceases to work. Let’s say that Twitter gives you the prices in shops and Tlitter gives you the discounts. App is still functional without Tlitter, but with Tlitter it’s just better and more complete.
You didn’t want to make a new release just to fix every Tlitter-related functionality, so you made an application on Google Appspot which acts like a proxy between your app and Tlitter. If Tlitter changes, you only have to update the proxy app and everything works again, for all of the users.
But, your application gained popularity and Google changed their pricing policy, introducing “Instance Hours” for Appspot. That two things made your app using almost all of the free quota. You don’t want to pay for Appspot, you just have to solve this problem somehow.
There’s more than one solution and probably there’s no perfect one. I’m asking you, how would you solve this problem? My ideas are as follows:
- Drop the idea of proxy app, process everything inside mobile app
- Pros: No problem with Appspot
- Cons: The need to update the app when Tlitter changes, more network traffic on user’s side
- Cache data inside proxy app and try to optimize it, or find a better cloud service
- Pros: No problem with updates, probably faster response times
- Cons: if the app will continue to gain the popularity it will eventually use all of the free resources, regardless of optimizations made
- Combine two solutions. Make application maintain some ‘Tlitter structure definition
file’, hosted online. content from Tlitter is extracted according to the rules specified in the file, application checks (daily, or hourly) for an update of that file.- Pros: No need to update the app when time Tlitter changes
- Cons: A very sophisticated solution, currently I have no idea how to implement it, possible security risks, etc.
The example provided may seem quite generic, but it models my problem almost perfectly. How would you solve it? I would go with solution 1, or 3 if I would find a good method of implementing it.
For solution 3 you would want to look for some DSL or scripting language that you can update. Maybe jsoup is a good base. You would load a file containing the selector strings to retrieve that data. In the example below (from the jsoup webpage), you would essentially load the strings (#mp-itn b a) from a web service.
Jsoup works straightforward on android.