So I’m pretty good with regular expressions, but I’m having some trouble with them on unix. Here are two things I’d love to know how to do:
1) Replace all text except letters, numbers, and underscore
In PHP I’d do this: (works great)
preg_replace('#[^a-zA-Z0-9_]#','',$text).
In bash I tried this (with limited success); seems like it dosen’t allow you to use the full set of regex:
text="my #1 example!"
${text/[^a-zA-Z0-9_]/'')
I tried it with sed but it still seems to have problems with the full regex set:
echo "my #1 example!" | sed s/[^a-zA-Z0-9\_]//
I’m sure there is a way to do it with grep, too, but it was breaking it into multiple lines when i tried:
echo abc\!\@\#\$\%\^\&\*\(222 | grep -Eos '[a-zA-Z0-9\_]+'
And finally I also tried using expr but it seemed like that had really limited support for extended regex…
2) Capture (multiple) parts of text
In PHP I could just do something like this:
preg_match('#(word1).*(word2)#',$text,$matches);
I’m not sure how that would be possible in *nix…
Part 1
You are almost there with the
sedjust add thegmodifier so that the replacement happen globally, without theg, replacement will happen just once.You did the same mistake with your bash pattern replacement too: not making replacements globally:
Part 2
Capturing works the same in
sedas it did in PHP’s regex: enclosing the pattern in parenthesis triggers capturing: