I have a text file in the format of: aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd; Where bcd can

Question

0

Asked: June 1, 20262026-06-01T03:45:53+00:00 2026-06-01T03:45:53+00:00

I have a text file in the format of: aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd; Where bcd can

0

I have a text file in the format of:

aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd;

Where “bcd” can be any length of any characters, excluding ; or :

What I want to do is print the text file in the format of:

aaa: bcd;bcd;bcddd;
aaa: bcd;bcd;bcd;

-etc-

My method of approach to this problem was to isolate a pattern of “;...:” and then reprint this pattern without the initial ;

I concluded I would have to use awk’s ‘gsub’ to do this, but have no idea how to replicate the pattern nor how to print the pattern again with this added new line character 1 character into my pattern.

Is this possible?
If not, can you please direct me in a way of tackling it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T03:45:54+00:00

We can’t quite be sure of the variability in the aaa or bcd parts; presumably, each one could be almost anything.

You should probably be looking for:

a series of one or more non-colon, non-semicolon characters followed by colon,
with one or more repeats of:
- a series of one or more non-colon, non-semicolon characters followed by a semi-colon

That makes up the unit you want to match.

/[^:;]+:([^:;]+;)+/

With that, you can substitute what was found by the same followed by a newline, and then print the result. The only trick is avoiding superfluous newlines.

Example script:

{
echo "aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd;" 
echo "aaz: xcd;ycd;bczdd;baa:bed;bid;bud;"
} |
awk '{ gsub(/[^:;]+:([^:;]+;)+/, "&\n"); sub(/\n+$/, ""); print }'

Example output

aaa: bcd;bcd;bcddd;
aaa:bcd;bcd;bcd;
aaz: xcd;ycd;bczdd;
baa:bed;bid;bud;

Paraphrasing the question in a comment:

Why does the regular expression not include the characters before a colon (which is what it’s intended to do, but I don’t understand why)? I don’t understand what “breaks” or ends the regex.

As I tried to explain at the top, you’re looking for what we can call ‘words’, meaning sequences of characters that are neither a colon nor a semicolon. In the regex, that is [^:;]+, meaning one or more (+) of the negated character class — one or more non-colon, non-semicolon characters.

Let’s pretend that spaces in a regex are not significant. We can space out the regex like this:

    / [^:;]+ : ( [^:;]+ ; ) + /

The slashes simply mark the ends, of course. The first cluster is a word; then there’s a colon. Then there is a group enclosed in parentheses, tagged with a + at the end. That means that the contents of the group must occur at least once and may occur any number of times more than that. What’s inside the group? Well, a word followed by a semicolon. It doesn’t have to be the same word each time, but there does have to be a word there. If something can occur zero or more times, then you use a * in place of the +, of course.

The key to the regex stopping is that the aaa: in the middle of the first line does not consist of a word followed by a semicolon; it is a word followed by a colon. So, the regex has to stop before that because the aaa: doesn’t match the group. The gsub() therefore finds the first sequence, and replaces that text with the same material and a newline (that’s the "&\n", of course). It (gsub()) then resumes its search directly after the end of the replacement material, and — lo and behold — there is a word followed by a colon and some words followed by semicolons, so there’s a second match to be replaced with its original material plus a newline.

I think that $0 must contain the newline at the end of the line. Therefore, without the sub() to remove a trailing newlines, the print (implictly of $0 with a newline) generated a blank line I didn’t want in the output, so I removed the extraneous newline(s). The newline at the end of $0 would not be matched by the gsub() because it is not followed by a colon or semicolon.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text file in the format of: aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd; Where bcd can

Leave an answerCancel reply

1 Answer

Example script:

Example output

Leave an answer
Cancel reply