I’ve got a bunch of strings like:
'Hello, here's a test colon:. Here's a test semi-colon;'
I would like to replace that with
'Hello, here's a test colon:. Here's a test semi-colon;'
And so on for all printable ASCII values.
At present I’m using boost::regex_search to match &#(\d+);, building up a string as I process each match in turn (including appending the substring containing no matches since the last match I found).
Can anyone think of a better way of doing it? I’m open to non-regex methods, but regex seemed a reasonably sensible approach in this case.
Thanks,
Dom
The big advantage of using a regex is to deal with the tricky cases like
&Entity replacement isn’t iterative, it’s a single step. The regex is also going to be fairly efficient: the two lead characters are fixed, so it will quickly skip anything not starting with&#. Finally, the regex solution is one without a lot of surprises for future maintainers.I’d say a regex was the right choice.
Is it the best regex, though? You know you need two digits and if you have 3 digits, the first one will be a 1. Printable ASCII is after all
 -~. For that reason, you could consider?\d\d;.As for replacing the content, I’d use the basic algorithm described for boost::regex::replace :