I’m creating test samples of text of varying length, where each sample is separated

Question

0

Asked: June 3, 20262026-06-03T01:11:58+00:00 2026-06-03T01:11:58+00:00

I’m creating test samples of text of varying length, where each sample is separated

0

I’m creating test samples of text of varying length, where each sample is separated by a line break. Currently I have 3mb+ files of text with no line breaks, only spaces. I was hoping for help with the proper reg expression to make sure no line breaks are cutting words in half.

I’m very new to using reg expressions. but I assumed that for i.e. lines of 300 character length, it would be somewhere in the ballpark of:

/.{300,}\s+/&\n/g

(Apologies, I know this doesn’t work!)

Note: I know there are similar posts about this subject, but I’m relatively sure there’s nothing out there that specifically addresses this scenario.

Update: Solved! Worked with this command: perl -lpe's/\b(.{80,300})\b/\1\n/g' file

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T01:12:00+00:00

Are you sure there are no new lines already in the data? (if there are, the . dot character will not match them) If there are no newlines, something as simple as this might work:

s/\s(.{80,300})\s/$1\n/g

The 80 lower bound is just an arbitrary choice, that will rarely affect the outcome, if there are no newlines present. You can make 300 lower if you want shorter lines.

Edit: changed \b to \s which may be a better choice to avoid unexpected line breaks around non-word characters, as pointed out by @tchrist. Also, OP did not say he needed Perl backreference’s, so tchrist changed \1 to $1, which makes more sense for Perl.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m creating test samples of text of varying length, where each sample is separated

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply