Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7720813
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T03:45:53+00:00 2026-06-01T03:45:53+00:00

I have a text file in the format of: aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd; Where bcd can

  • 0

I have a text file in the format of:

aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd; 

Where “bcd” can be any length of any characters, excluding ; or :

What I want to do is print the text file in the format of:

aaa: bcd;bcd;bcddd;
aaa: bcd;bcd;bcd;

-etc-

My method of approach to this problem was to isolate a pattern of “;...:” and then reprint this pattern without the initial ;

I concluded I would have to use awk’s ‘gsub’ to do this, but have no idea how to replicate the pattern nor how to print the pattern again with this added new line character 1 character into my pattern.

Is this possible?
If not, can you please direct me in a way of tackling it?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T03:45:54+00:00Added an answer on June 1, 2026 at 3:45 am

    We can’t quite be sure of the variability in the aaa or bcd parts; presumably, each one could be almost anything.

    You should probably be looking for:

    • a series of one or more non-colon, non-semicolon characters followed by colon,
    • with one or more repeats of:
      • a series of one or more non-colon, non-semicolon characters followed by a semi-colon

    That makes up the unit you want to match.

    /[^:;]+:([^:;]+;)+/
    

    With that, you can substitute what was found by the same followed by a newline, and then print the result. The only trick is avoiding superfluous newlines.

    Example script:

    {
    echo "aaa: bcd;bcd;bcddd;aaa:bcd;bcd;bcd;" 
    echo "aaz: xcd;ycd;bczdd;baa:bed;bid;bud;"
    } |
    awk '{ gsub(/[^:;]+:([^:;]+;)+/, "&\n"); sub(/\n+$/, ""); print }'
    

    Example output

    aaa: bcd;bcd;bcddd;
    aaa:bcd;bcd;bcd;
    aaz: xcd;ycd;bczdd;
    baa:bed;bid;bud;
    

    Paraphrasing the question in a comment:

    Why does the regular expression not include the characters before a colon (which is what it’s intended to do, but I don’t understand why)? I don’t understand what “breaks” or ends the regex.

    As I tried to explain at the top, you’re looking for what we can call ‘words’, meaning sequences of characters that are neither a colon nor a semicolon. In the regex, that is [^:;]+, meaning one or more (+) of the negated character class — one or more non-colon, non-semicolon characters.

    Let’s pretend that spaces in a regex are not significant. We can space out the regex like this:

        / [^:;]+ : ( [^:;]+ ; ) + /
    

    The slashes simply mark the ends, of course. The first cluster is a word; then there’s a colon. Then there is a group enclosed in parentheses, tagged with a + at the end. That means that the contents of the group must occur at least once and may occur any number of times more than that. What’s inside the group? Well, a word followed by a semicolon. It doesn’t have to be the same word each time, but there does have to be a word there. If something can occur zero or more times, then you use a * in place of the +, of course.

    The key to the regex stopping is that the aaa: in the middle of the first line does not consist of a word followed by a semicolon; it is a word followed by a colon. So, the regex has to stop before that because the aaa: doesn’t match the group. The gsub() therefore finds the first sequence, and replaces that text with the same material and a newline (that’s the "&\n", of course). It (gsub()) then resumes its search directly after the end of the replacement material, and — lo and behold — there is a word followed by a colon and some words followed by semicolons, so there’s a second match to be replaced with its original material plus a newline.

    I think that $0 must contain the newline at the end of the line. Therefore, without the sub() to remove a trailing newlines, the print (implictly of $0 with a newline) generated a blank line I didn’t want in the output, so I removed the extraneous newline(s). The newline at the end of $0 would not be matched by the gsub() because it is not followed by a colon or semicolon.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have huge text file of such format: aaa bbb 1 aaa ccc 2
I have a text file with Tag - Value format data. I want to
I have a text file (~10GB) with the following format: data1<TAB>data2<TAB>data3<TAB>data4<NEWLINE> I want to
I have a text file of this format: L O A D C A
I have a text file containing the data in following format 12345 Abdt3 hy45d
I have a basic C# console application that reads a text file (CSV format)
I am reading a text file with this format: grrr,some text,45.4321,54.22134 I just have
I have few Gigabytes text file in format: {user_ip:x.x.x.x, action_type:xxx, action_data:{some_key:some_value...},...} each entry is
Hi i have my text file in this format **4 1250000209852 01 XXXX XXXX
I have a text file from which I want to create a Hash for

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.