Ever few minutes around 500 paragraphs are supposed to be submitted to the database

Question

0

Asked: June 10, 20262026-06-10T03:46:11+00:00 2026-06-10T03:46:11+00:00

Ever few minutes around 500 paragraphs are supposed to be submitted to the database

0

Ever few minutes around 500 paragraphs are supposed to be submitted to the database in a table called “Content” (this number will go to over 2,500 in a few months).
There is another table called “Keywords” which has over 4,000 rows (and is expected to grow to over 10,000).

Keywords
+------------+-------------------+
| Keyword_id | keyword           |
+------------+-------------------+
|          1 | "Venture Capital" |
|          2 | "Financing"       |
+------------+-------------------+

The question is: What is the best way to scale a solution where each keyword is cross-referenced among an incoming paragraphs of text to see if there is a match?

Since I’m not concerned about where in the paragraph there is a match (my only concern is that there IS a match);
if(preg_match()){} could possibly work but even at the low-end that’s 2,000,000 times you’re running over a paragraph searching for a keyword.
Plus, correct me if I’m wrong, preg_match is pretty expensive.

One of the possibilites that crossed my mind was to keep an array of the keywords in the cache instead of having to call on the DB for every row.
That would definitely help speed things up I think.

I’m not concerned with this being only in PHP.
If this section of the application needs to be in Python (correct me if I’m wrong, but I hear Python is a lot less expensive at parsing text), then I’m all ears.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T03:46:13+00:00

Editorial Team

2026-06-10T03:46:13+00:00Added an answer on June 10, 2026 at 3:46 am

With MySQL:

Search query: Vent Capit

Using match against:

SELECT * FROM keywords WHERE MATCH (keyword) AGAINST ('+Vent* +Capit*' IN BOOLEAN MODE);

If your using _ci collation, (ci stands for case insensitive), the matching would ignore capitalization 🙂

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Ever few minutes around 500 paragraphs are supposed to be submitted to the database

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply