I’m extracting the content between XML tags using the following: – perl -lne ‘BEGIN{undef

Question

0

Asked: June 10, 20262026-06-10T02:47:21+00:00 2026-06-10T02:47:21+00:00

I’m extracting the content between XML tags using the following: – perl -lne ‘BEGIN{undef

0

I’m extracting the content between XML tags using the following: –

perl -lne 'BEGIN{undef $/} while (/<tagname>(.*?)<\/tagname>/sg){print $1}' input.txt > output.txt

Unfortunately I’m getting out of memory issues, I know I can split the file and process each then concat but I wondered if there was another way, be it a modification to the above or using the likes of awk or sed?

The input.txt file size varies between 17GB and 70GB.

EDIT:

The input file can be any XML file, a point to note is that it contains no newlines, e.g. : –

<body><a></a><b></b><c></c></body><foo></foo><bar><z></z></bar>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T02:47:23+00:00

This one-liner reads entire file into memory as one gigantic “line”. Of course you’ll have problems with memory with stuffing 17GB and more into it! Read and process file line-by-line or use read to get chunks of suitable size instead.

In this case, search for <tagname>, note its position in line and search for closing tag starting from there. If you didn’t find it, stuff current line/chunk into buffer and repeat until you’ve found it on some other line further in file. When found, print out this buffer and empty it. Repeat until the end of file.

Note that if you’d use arbitrary sized chunks, you’ll have to account for possibility of tag split by boundary by cutting incomplete tag from end of chunk and stuffing it in “to process” buffer.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m extracting the content between XML tags using the following: – perl -lne ‘BEGIN{undef

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply