I have a XML file with the following data format:
<net NetName="abc" attr1="123" attr2="234" attr3="345".../>
<net NetName="cde" attr1="456" attr2="567" attr3="678".../>
....
Can anyone tell me how could I data mine the XML file using an awk one-liner? For example, I would like to know attr3 of abc. It will return 345 to me.
In general, you don’t. XML/HTML parsing is hard enough without trying to do it concisely, and while you may be able to hack together a solution that succeeds with a limited subset of XML, eventually it will break.
Besides, there are many great languages with great XML parsers already written, so why not use one of them and make your life easier?
I don’t know whether or not there’s an XML parser built for awk, but I’m afraid that if you want to parse XML with awk you’re going to get a lot of “hammers are for nails, screwdrivers are for screws” answers. I’m sure it can be done, but it’s probably going to be easier for you to write something quick in Perl that uses XML::Simple (my personal favorite) or some other XML parsing module.
Just for completeness, I’d like to note that if your snippet is an example of the entire file, it is not valid XML. Valid XML should have start and end tags, like so:
I’m sure invalid XML has its uses, but some XML parsers may whine about it, so unless you’re dead set on using an awk one-liner to try to half-ass “parse” your “XML,” you may want to consider making your XML valid.
In response to your edits, I still won’t do it as a one-liner, but here’s a Perl script that you can use:
Run this script from the command line with 1 or 2 arguments. The first argument is the
'NetName'you want to look up, and the second is the attribute you want to look up. If no attribute is given, it should just list all the attributes for that'NetName'.