Please bear with me while I provide a short background:
- My application retrieves user data from Facebook and LinkeIn.
- Both have very strict terms of use. Specifically, they do not
allow to save user data on my application (well, kind-of.
Facebook allows caching. Linked does not allow even caching). - The naive solution is to call Facebook/LinkedIn whenever I need the
data from them. The problem is that this becomes too slow if I need
lots of data (e.g. profiles of 100 users). Batch calls make things
better, but they have limits and I’m not sure this approach can
scale.
So the question is how to make my application run fast while using data from Facebook/LinkedIn?
If you can share from your experience, or have an example for a site that uses lots of data from Facebook/LinkedIn I’d love to hear.
When you talk about making your application “fast”, please note “fast” can mean either “high throughput” or “low latency”, and there is a big difference between the two. It would be good to set performance goals for both latency (how quickly each individual user should be served) and throughput (how many users you should be able to serve per unit time).
If getting data from FB/LinkedIn is a bottleneck for throughput,
If getting data from FB/LinkedIn is a bottleneck for latency,
If you absolutely must have data XYZ from FB/LinkedIn to serve a user, and the latency of a single API request is N seconds, and your target maximum time to serve each user is < N seconds, the only possible way you can reach your goal is by prefetching data. Maybe when you see the very first page request come in for a user (say for the home page), you can start loading all the data which will be needed for that user into the cache (if it’s not already there).
Whatever you do, I recommend that you encapsulate your FB/LinkedIn data access code inside a “data access layer”. Caching should happen strictly inside the data access layer — the application code doesn’t need to know about the cache. Whether you use batched calls or not, and whether you issue multiple calls in parallel is also an implementation details which should be kept strictly inside the data access layer.