I am looking for a solution to this problem and suspect awk should provide a simple enough solution instead of my clumsy shell script.
I have an xml file consisting of multiple sections as shown below. I also have a list of values.
For each section <top_tag> ... </top_tag> where value_x is in my list, delete (ie:not print) the section <top_tag> ... </top_tag>
<xml>
<outer_tag>
<top_tag>
<tag>value_1</tag>
<other_tags></other_tags>
</top_tag>
<top_tag>
<tag>value_2</tag>
<other_tags></other_tags>
</top_tag>
...
<top_tag>
<tag>value_n</tag>
<other_tags></other_tags>
</top_tag>
</outer_tag>
Your suggestions are most appreciated.
What you need here is not awk but XSLT, which was created specifically for this kind of tasks. It lets you transform an xml document into a different xml.
For an input much like yours:
The following XSLT removes all
top_tagelements withvalue_3by simply not copying them and ignoring their contents.Every major programming language has at least a couple of libraries that can process an XML input according to an XSLT. Command line tools and UI-based applications (IDEs but not only those) can do it as well. Finally, web browsers can transform files using XSLT if you include the xsl file with a processing instruction like this: