I’d like to try to make a simple twitter client that learns my tastes and automatically finds friends and interesting tweets to provide me with relevant information.
To get started, I would need to get a good stream of random twitter messages, so I can test a few machine learning algorithms on them.
What API methods should I use for this? Do I have to poll regularly to get messages, or is there a way to get twitter to push messages as they are published?
I’d also be interested in learning about any similar project.
I use tweepy to access Twitter API and listen to the public stream they provide — which should be a one-percent-sample of all tweets. Here is my sample code that I use myself. You can still use the basic auth mechanism for streaming, though they may change that soon. Change the USERNAME and PASSWORD variables accordingly and make sure you respect the error codes that Twitter returns (this sample code might not be respecting the exponential backoff mechanism that Twitter wants in some cases).
I also set the timeout of the socket module, I believe I had some problems with the default timeout behavior in Python, so be careful.