I work with XML files containing book data. When investigating data corruption issues I

Question

0

Asked: May 27, 20262026-05-27T16:06:03+00:00 2026-05-27T16:06:03+00:00

I work with XML files containing book data. When investigating data corruption issues I

0

I work with XML files containing book data. When investigating data corruption issues I often have to extract the whole records which include a particular string.

I am struggling to do this with my very limited knowledge of bash scripting and total lack of knowlwdge of other programming languages such as perl.

I have standard user access to a Linux box (RHEL 4) with no prospect of getting permission to install anything not already present.

Using standard tools/languages available on this box, can anyone explain how I might look for a particular string and extract any whole records from the file which might contain it?

E.g. to extract the whole records which contain ‘Smith’ from the following file.

Example data:

<File>
<Product>
<Ref>1</Ref>
<Title>My Life</Title>
<Series>Life Stories</Series>
<Author>John Smith</Author>
<Price>5.99</Price>
</Product>
<Product>
<Ref>2</Ref>
<Title>A Story</Title>
<Author>Fred Bloggs</Author>
<Price>16.99</Price>
</Product>
<Product>
<Ref>3</Ref>
<Title>Book 1</Title>
<Author>Jane Smith</Author>
<Price>10.99</Price>
</Product>
</File>

Required output:

<Product>
<Ref>1</Ref>
<Title>My Life</Title>
<Series>Life Stories</Series>
<Author>John Smith</Author>
<Price>5.99</Price>
</Product>
<Product>
<Ref>3</Ref>
<Title>Book 1</Title>
<Author>Jane Smith</Author>
<Price>10.99</Price>
</Product>

That is to say everything between the <Product> </Product> tags for the records containing the string ‘Smith’.

The records may contain different numbers of tags but will always be enclosed in <Product> </Product> tags.

I appreciate the perfect result may not be possible every time without using more specialist tools but I simply don’t have them available to me. Anything which gets me close would be great.

I’m thinking the script would read each record in the file, look for the string within each record in turn and redirect those records which match to an output. However, I am struggling to find the answer anywhere.

Many thanks for any help you can offer.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T16:06:03+00:00

Editorial Team

2026-05-27T16:06:03+00:00Added an answer on May 27, 2026 at 4:06 pm

this should work for your example:

 awk 'BEGIN{RS="<[/]?Product>"} /Smith/{print "<Product>",$0,"</Product>"}' file

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I work with XML files containing book data. When investigating data corruption issues I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply