I was trying to reverse engineer Twitter-Live Search. Maybe we could discuss it here. I am talking about the feature where Tweets are shown even latest to “1 sec ago” etc. Trying to understand how the following might happen –
- There must be some layer between when the user tweets & when the index (updates) happen. Is this layer MySQL or some other caching layer (memcached, cassandra)? Maybe…
- Indexing – How might the index updates be happening? They can’t possibly build a new index from scratch?
- Indexing – There must be a distributed index here. How to update all the Indexes without having to serve stale data from one index & latest data from the other?
- Indexing – Or does it matter if something like this happens? Honestly I don’t think so 🙂 Which user would notice…
Anybody have anything interesting to add/discuss. I am just trying to understand…
Interesting indeed, but I guess it’s more of an “architecture” question, and not really a programming question.
But FYI there’s a lot of information at high scalability: posts tagged with twitter
Do they keep all tweets? My guess is they just throw them away after a while, and surely they don’t need ACID properties? ..
And I wouldn’t trust those timestamps if I where you 🙂