I have to write a script that will count the number of xml tags(say

Question

0

Asked: May 27, 20262026-05-27T17:07:30+00:00 2026-05-27T17:07:30+00:00

I have to write a script that will count the number of xml tags(say

0

I have to write a script that will count the number of xml tags(say Code) in a xml file using shell script. XML file can be anyone of the following formats:

Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code
<Code Attr=1> </C
ode>

In short XML file can in any format and can have new lines anywhere.
Please help me, I need to do this soon..

Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T17:07:31+00:00

Regular expressions are a bad way to parse XML, using some sort of XML parser is better.

If you really want to use sed/awk/shell/grep etc, the first thing I can think of is:

 cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>' | wc -l

I don’t know awk very well, but I’m sure there are awk ninjas out there who can do it more elegantly than this.

It only counts occurences of <Code> (& variations) but not the closing tag, so if you have (for example) 10 <Code> in your file but only 9 </Code>, it will return 10 and not 9.

Basically:

cat tst | xargs cats ‘tst’ to the shell all on one line (so I don’t have to worry about new lines);
grep -o '<\s*C\s*o\s*d\s*e[^>]*>' prints all matches of <Code{optional other stuff}> where you can have newlines/spaces in between all letters of Code (the -o prints just the matches to the regex, one per line);
wc -l counts the lines.

Try each bit successively to see what I mean.

For me tst was just a copy-paste of what you have above.

[foo@bar ~]$cat tst
Format #1: 
<Code>value1</Code> <Code>value2</Code>

 Format #2: 
<Code Attr1=va>value1</Code> <Code Attr1=va
Attr2=va>value1</Code>

Format #3: 
<Code>value1</Code><Code>value2</Code> (All Codes can be in
a single line or multiple lines)

Format #4 
   <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code>

Format #5: 
<Cod 
e>Value1</Code
<Code Attr=1> </C
ode>

[foo@bar ~]$cat tst | xargs
Format #1: <Code>value1</Code> <Code>value2</Code> Format #2: <Code Attr1=va>value1</Code> <Code Attr1=va Attr2=va>value1</Code> Format #3: <Code>value1</Code><Code>value2</Code> (All Codes can be in a single line or multiple lines) Format #4 <Code Attr1=va>value1</Code><Code Attr2=va>value1</Code> Format #5: <Cod e>Value1</Code <Code Attr=1> </C ode>

[foo@bar ~]$cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>'
<Code>
<Code>
<Code Attr1=va>
<Code Attr1=va Attr2=va>
<Code>
<Code>
<Code Attr1=va>
<Code Attr2=va>
<Cod e>
<Code Attr=1>

[foo@bar ~]$cat tst | xargs | grep -o '<\s*C\s*o\s*d\s*e[^>]*>' | wc -l
10

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to write a script that will count the number of xml tags(say

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply