I have a file “frequencies.xml” which contains lines with this form:
<?xml version="1.0"?>
<!DOCTYPE stationlist PUBLIC "-//xxxxx//DTD stationlist 1.0//EN" "http://xxxxxxxxx/DTD/xxxxxxxx.dtd">
<frequencies xmlns="http://xxxxxxxxxxxxxxxx/DTD/">
<list norm="PAL" frequencies="Custom" audio="bg">
..............................................................
<station name="A" active="1" channel="48.25MHz" norm="PAL"/>
<station name="B" active="1" channel="55.25MHz" norm="PAL"/>
<station name="C" active="1" channel="62.25MHz" norm="PAL"/>
<station name="D" active="1" channel="112.25MHz" norm="PAL"/>
..............................................................
<station name="E" active="1" channel="119.25MHz" norm="PAL"/>
<station name="F" active="0" channel="48.25MHz" norm="PAL"/>
..............................................................
<station name="G" active="1" channel="55.25MHz" norm="PAL"/>
<station name="H" active="0" channel="62.25MHz" norm="PAL"/>
..............................................................
</list>
</frequencies>
I want to remove line considered duplicate if contains same frequencies with other line.
Output results:
<station name="A" active="1" channel="48.25MHz" norm="PAL"/>
<station name="B" active="1" channel="55.25MHz" norm="PAL"/>
<station name="C" active="1" channel="62.25MHz" norm="PAL"/>
<station name="D" active="1" channel="112.25MHz" norm="PAL"/>
<station name="E" active="1" channel="119.25MHz" norm="PAL"/>
I write script to do this:
for i in `cat frequencies.xml | sed 's/.*channel="\([^"]*\)".*/\1/; /</ d' |grep MHz`; do
cat frequencies.xml | awk -v i="channel=\"$i" '
BEGIN { a=0 }
$0 ~ i { if ( a == "1" ) { print i"\" - duplicate" > "/dev/stderr" ; next ;} ; a=1 }
{ print $_ }' > frequencies.xml.tmp && \
mv frequencies.xml.tmp frequencies.xml
done
How transposing this in perl language?
Thanks
Update: I want to keep XML structure.
My code:
open (FH, "+< frequencies.xml") or die "Opening: $!";
my $out = '';
my %seen = ();
foreach my $line ( <FH> ) {
if ( $line =~ m/<station/ ) {
my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
$out .= $line unless $seen{$freq}++;
} else {
$out .= $line;
}
}
seek(FH,0,0) or die "Seeking: $!";
print FH $out or die "Printing: $!";
truncate(FH, tell(FH)) or die "Truncating: $!";
close(FH) or die "Closing: $!";
Keep a hash to track what frequencies you’ve seen, and if you’ve seen it, don’t emit the line:
update :
If there’s other lines to keep, you just need to print them. The easiest way is likely to just do the test if it’s a
<station>element, and print everything else … but once you start getting more complex than this, you may want to use one of the true XML Parsers. So, using Zaid’s suggestion: