I am downloading an XML file using wget, but sometime the file has text

Question

0

Editorial Team

Asked: June 14, 20262026-06-14T23:15:04+00:00 2026-06-14T23:15:04+00:00

I am downloading an XML file using wget, but sometime the file has text

0

I am downloading an XML file using wget, but sometime the file has text in the first line that I need to get rid off.

It currently has “131” on the first line and “0” on the last line.

I need a way of removing these lines if it contains this information. I can’t do a perl find and replace, in-case it is not there but the proper first line contains “131”.

Does this make sense?

Any ideas?

Thanks

Example, sometimes it is this:

131
<element>
<example>content</example>
<example>content</example>
<example>content</example>
<example>content</example>
</element>
0

It is sometimes like this (correct)

<element>
<example>content</example>
<example>content</example>
<example>content</example>
<example>content</example>
</element>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T23:15:05+00:00

That’s a job for sed! You would’nt find quicker or simplier:

If you’re sure of the two values, you could simply:

sed -e  '1{/^131$/d};${/^0$/d}' -i mybrokenfile

But whith the following command, sed while remove any first and/or last line containing only a number:

sed -e '1{/^[0-9]\+$/d};${/^[0-9]\+$/d}'

This could by run with files as param and backup files auto-generation:

sed -e '1{/^[0-9]\+$/d};${/^[0-9]\+$/d}' -i.bak files*

Explained:

there is two parts, 1 and $ are address: 1 for first line and $ for last line.
Following block present another form of address-by-condition: /^[0-9]\+$/ mean *lines that begin with one or more characters between 0 and 9 and ending immediately after.
At this matching lines (on first or last line), the command to execute is d for delete line.

This could be written:

sed -e '1{
            /^[0-9]\+$/d
        }
        ${
            /^[0-9]\+$/d
        }' -i.bak files*

as well.

Edit:

As I hate to write more than one time… approx anything;

There is a way to do some tricky thing, but only on 1st and last line.

First, the same sample could by written:

sed -e '1ba;$ba;bb;:a;/^[0-9]\+$/d;:b;' -i.bak files*

So this is 1 byte shorter! But especially the operation is written only once:

Explained:

:a and :b are labels where to branch (jump) to
ba and bb are branch instruvtion respectively to :a and :b.
1 and $ are address as previously described
/.../d is previously described too, mean delete lines matching regex

And could by written:

sed -e '
    1ba;
    $ba;
    bb;
   :a;
    /^[0-9]\+$/d;
   :b;
  ' -i.bak files*

Sample of application, using s/../../ instead of only d:
Modify version info only if present at 1st or last line:

 sed -e '1ba;$ba;bb;:a;s/\(Id: .*,v\).*\(Exp\)/\1'"$(
             date +" $VER %F %T $USER ")"'\2/;b;' -i files*

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am downloading an XML file using wget, but sometime the file has text

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply