I’m going to be running through live twitter data and attempting to pull out

Question

0

Asked: May 25, 20262026-05-25T20:17:43+00:00 2026-05-25T20:17:43+00:00

I’m going to be running through live twitter data and attempting to pull out

0

I’m going to be running through live twitter data and attempting to pull out tweets that mention, for example, movie titles. Assuming I have a list of ~7000 hard-coded movie titles I’d like to look against, what’s the best way to select the relevant tweets? This project is in it’s infancy so I’m open to any looking into any solution (i.e. language agnostic.) Any help would be greatly appreciated.

Update: I’d be curious if anyone had any insight to how the Yahoo! Placemaker API, solves this problem. It can take a text string and return a geocoded JSON result of all the locations mentioned in it.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T20:17:44+00:00

You could try Wu and Manber’s A Fast Algorithm For Multi-Pattern Searching.

The multi-pattern matching problem lies at the heart of virus scanning, so you might look to scanner implementations for inspiration. ClamAV, for example, is open source and some papers have been published describing its algorithms:

Lin, Lin and Lai: A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning (a variant of Wu-Manber; the paper is behind the IEEE paywall).

Cha, Moraru, et al: SplitScreen: Enabling Efﬁcient, Distributed Malware Detection

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m going to be running through live twitter data and attempting to pull out

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply