I am working on some free text for that I need to do some data cleaning, I have a question (out of many, which I will ask later I am sure):
I need to replace the following combinations:
[ ; ] (space before and after the punctuation)
[;] (no space before and after the punctuation)
[ ;] (only space before the punctuation)
to
[; ] (only space after the punctuation)
…where the punctuation can be one of [;:,.]. How can I do this with a regex?
A possible expression would be:
and depending on the programming language or tool you are using, you have to use
$1,\\1or\1for the backreference and the replacement would be e.g.$1(there is a space after1).Explanation:
References: character class, capture group, quantifier
But again: The expression can differ, depending on the tool/language you are using. E.g. a similar expression for
sedwould look like:but this would also trim the spaces around the punctuation (there is probably a better way, but I’m not so familiar with
sed).