I have a $text to strip off all non-alphanumeric chars, replace multiple white spaces and newline by single space and eliminate beginning and ending space.
This is my solution so far.
$text = '
some- text!!
for testing?
'; // $text to format
//strip off all non-alphanumeric chars
$text = preg_replace("/[^a-zA-Z0-9\s]/", "", $text);
//Replace multiple white spaces by single space
$text = preg_replace('/\s+/', ' ', $text);
//eliminate beginning and ending space
$finalText = trim($text);
/* result: $finalText ="some text for testing";
without non-alphanumeric chars, newline, extra spaces and trim()med */
Is it possible to combine/achieve all these in one regular expression? as I would get the desired result in one line as below
$finalText = preg_replace(some_reg_expression, $replaceby, $text);
thanks
Edit: clarified with a test string
Of course you can. That is very easy.
The re will look like:
I have no PHP at hand, I have used Perl (just to test the re and show that it works) (you can play with my code here):
For PHP it will be the same.
What does the RE?
The only tricky part here is
((?<= )\s*), lookbehind assertion. You remove spaces if and only if the substring of spaces has a space before.When you want to know how lookahead/lookbehind assertions work, please take a look at http://www.regular-expressions.info/lookaround.html.
Update from the discussion:
What happens when
$text ='some ? ! ? text';?Then the resulting string contains multiple spaces between “some” and “text”.
It is not so easy to solve the problem, because one need positive lookbehind assertions with variable length, and that is not possible at the moment. One cannot simple check spaces because it can happen so that it is not a space but non-alphanumerich character and it will be removed anyway (for example: in
" !"the"!"sign will be removed but RE knows nothing about; one need something like(?<=[^a-zA-Z0-9\s]* )\s*but that unfortunately will not work because PCRE does not support lookbehind variable length assertions.