I often find myself needing a tool that would allow me to:
search for multiple multi-line regex patterns in a large file and replace them using back-referencing.
Should I:
- take the 2 hours it’ll require to build myself such a tool
- use something someone has already built (please suggest)
- learn to use a language that’s particularly good at this type of thing(Perl?)
Example
I have an xml document containing thousands of entries. There are about 100 entries with a known value field which need to be removed. I can build a regular expression for each entry. The expression is the same for the 100 entries except for the value string part. Either this tool would need to be able to loop through once for every value or just once with 100 OR terms (|) in the expression (it would be huge). In this case I’m replacing the matches with a blank but in other cases, I’d reformat the text and re-insert the value field.
I reckon you should write the thing in Python. The python re library is great:
N.B. You could also use python XML parsing libraries to strip out the elements you don’t want. Using the python XMl parsing simplifies some of the complexity that I have ignored in my example (multiple lines etc). In lieu of a Python XML parsing example this question has some good answers re parsing XML in Python.