I’ve currently got a cron job which uses the Twitter search API running every minute, but that limits me to only 100 results per request, I wish to start using the streaming API but worry this will increase the server load further (I’m using shared hosting at the moment and my cron job has already raised a few red flags).
My question is, what’s the minimum specs on a server I should get in order to capture the streaming API data sufficiently without any backlog of data?
A design that I’ve used and have seen others do so as well, is to use a message queue. Load tweets to the stream onto the queue via a thread dedicated to that purpose. Then you can have another thread on the other side of the queue reading tweets and processing as necessary. Here’s a nice example of what I’m talking about:
http://www.laurentluce.com/posts/python-twitter-statistics-and-the-2012-french-presidential-election/
Noone can tell you, without sufficient analysis, what your specs should be. The closest answer you’ll get, as suggested by one of the comments to your question, is to try it. Generally, do a quick prototype on what you want to do to see what the effect is and measure as necessary. Again, there are many architectural principles in play here, so it isn’t even wise for someone to tell you exactly what you should do.
Joe