For example, given a line a11b12c22d322 e… the fields of break are the numbers

Question

0

Asked: June 18, 20262026-06-18T00:16:21+00:00 2026-06-18T00:16:21+00:00

For example, given a line a11b12c22d322 e… the fields of break are the numbers

0

For example, given a line a11b12c22d322 e... the fields of break are the numbers or spaces, we want to transform it into

a
b
c
d
e
...

sed need to read the whole line into memory, for gigabytes a line, it would not be efficient, and the job could not be done if we don’t have sufficient memory.

EDIT:

Could anyone please explain how do grep, tr, Awk, perl, and python manipulate the memory in reading a large file? What and how much content do they read into memory once a time?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T00:16:22+00:00

If you use gawk (which is the default awk on Linux, I believe), you can use the RS parameter to specify that multi-digit numbers or spaces are recognized as line terminators instead of a new-line.

awk '{print}' RS="[[:digit:]]+| +" file.txt

As to your second question, all of these programs will need to read some fixed number of bytes and search for its idea of a line separator in an internal buffer to simulate the appearance of reading a single line at a time. To prevent it from reading too much data while searching for the end of the line, you need to change the programs idea of what terminates a line.

Most languages allow you to do this, but only allow you to specify a single character. gawk makes it easy by allowing you to specify a regular expression to recognize an end-of-line character. This saves you from having to implement the fixed-size buffer and end-of-line search yourself.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

For example, given a line a11b12c22d322 e… the fields of break are the numbers

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply