I am currently writing a script to filter some e-mail contents I’ve received.
These e-mails are replies to my newsletter campaign, I would like to know who wants to be unsubscribed. For that, I can seek some specific words such as “please”, “remove”, and I also have to check if the e-mail was sent from a mailer-daemon so I can alert the user to change his e-mail address next time he logs in.
First, I have to retrieve the dictionnary from a MySQL database, it contains 77 words at the moment, then I call two preg_match_all function for each word in each mail I process.
I want to check if the entire word \bplease\b can be seen inside the e-mail content, and some people can make a mistake and write the word as “pleease”, then I use \bp+l+e+a+s+e+\b after that.
But the dictionnary is growing up, more and more words are added inside, and the script is getting slower, it’s processing at least 4 emails per second.
Do you have any other way to process the mails faster than now ?
Would it be faster to build a regex matching 77 words instead of executing 77 preg_match_all commands ?
To your question, “Would it be faster to build a regex matching 77 words instead of executing 77 preg_match_all commands ?,” I recommend not using a regex at all. I could be wrong, but I think string functions are faster than regex. Read this: http://www.webdeveloper.com/forum/showthread.php?190485-performance-doubts-regex-vs-string-functions-using-this-and-1-more-doubt!-plz!-) Also read this Which is more efficient, PHP string functions or regex in PHP?