I installed and tried out tweepy, I am using the following function right now:
from API Reference
API.public_timeline()
Returns the 20 most recent statuses from
non-protected users who have set a custom user icon. The public
timeline is cached for 60 seconds so requesting it more often than
that is a waste of resources.
However, I want to do extract all tweets that match a certain regular expression from the complete live stream. I could put public_timeline() inside a while True loop but that would probably run into problems with rate limiting. Either way, I don’t really think it can cover all current tweets.
How could that be done? If not all tweets, then I want to extract as many tweets that match a certain keyword.
The streaming API is what you want. I use a library called tweetstream. Here’s my basic listening function:
I haven’t looked in a while, but I’m pretty sure that this library is just accessing the sample stream (as opposed to the firehose). HTH.
Edit to add: you say you want the “complete live stream”, aka the firehose. That’s fiscally and technically expensive and only very large companies are allowed to have it. Look at the docs and you’ll see that the sample is basically representative.