Need to parse a text file into a specific xml format.(file is a huge one millions of line and looks like this)
ABC-DATA-FILE-VERSION: 2.0
OBJFILE: /home/abc/src/solaris/abc.o
TIMESTAMP: 1348314377 727216
SRCFILE: /home/abc/src/solaris/abc.C
167 7
170 7
174 0
179 0
174 0
192 7
196 7
199 7
215 0
OBJFILE: /home/abcd/src/solaris/abcd.o
TIMESTAMP: 1348314377 727216
SRCFILE: /home/abcd/src/solaris/abcd.C
58 7
65 7
66 7
67 7
69 0
79 0
84 0
97 14
100 7
108 14
110 7
115 14
OBJFILE: /home/abcd/src/solaris/xyz.o
TIMESTAMP: 1348314377 727216
SRCFILE: /home/abcd/src/solaris/xyz.C
978 0
979 1
993 0
996 0
997 0
1011 0
1003 0
1004 0
1011 0
Now i wanted to convert it to a specific xml file format. like
<packages>
<package name="com" line-rate="0.45161290322580644" branch-rate="0.4915254237288136" complexity="3.391891891891892">
<classes>
<class branch-rate="0" complexity="0" filename="/home/abcd/src/solaris/abcd.C" line-rate="0.25" name="TestRunnerModel">
<methods/>
<lines>
<line number="13" hits="1" branch="true"/>
<line number="14" hits="1" branch="true"/>
<line number="15" hits="1" branch="false"/>
<line number="12" hits="0" branch="false"/>
</lines>
</class>
<class branch-rate="0" complexity="0" filename="/home/abcd/src/solaris/abcd.C" line-rate="0.25" name="TestRunnerModel">
<methods/>
<lines>
<line number="13" hits="1" branch="true"/>
<line number="14" hits="1" branch="true"/>
<line number="15" hits="1" branch="false"/>
<line number="12" hits="0" branch="false"/>
</lines>
</class>
<class branch-rate="0" complexity="0" filename="/home/abcd/src/solaris/xyz.C" line-rate="0.25" name="TestRunnerModel">
<methods/>
<lines>
<line number="13" hits="1" branch="true"/>
<line number="14" hits="0" branch="true"/>
<line number="15" hits="1" branch="false"/>
<line number="12" hits="0" branch="false"/>
</lines>
</class>
</classes>
</package>
</packages>
Most of the xml parameters are constant only few i need to populate like
FILENAME reading from SRCFILE: /home/abcd/src/solaris/xyz.C
and
line number=”978″ hits=”0″ branch=”true”
line number=”979″ hits=”1″ branch=”false”
etc
and so on. Please help.
In principle, it’s very simple. You have input in a given input format, and you’d like to produce output in a given output format. You need a parser for the input format to identify its structure and build data structures that represent that structure. And you need a serializer for your data structures that produces the XML you want.
Parsing libraries may exist for your input format, in which case you may want to use them instead of writing your own parser from scratch. Your language may also (and probably does) have libraries for serializing things as XML; you may want to use them.
If you know how to write a parser for a defined format, you now know what you need to do. If you don’t, you may be able to fake it with sed, awk, perl, or the batch editor of your choice, but your life as a programmer will be a lot more fun if you spend some time learning about parsing.