I am attempting to extract data between the nth occurrence of 2 patterns. Pattern

Question

0

Editorial Team

Asked: June 17, 20262026-06-17T15:39:16+00:00 2026-06-17T15:39:16+00:00

I am attempting to extract data between the nth occurrence of 2 patterns. Pattern

0

I am attempting to extract data between the nth occurrence of 2 patterns.

Pattern 1: CardDetail

Pattern 2: ]

The input file, input.txt has thousands of lines that vary in what each line contains. The lines I’m concerned with grabbing data from will always contain CardDetail somewhere in the line. Finding the matching lines is easy enough using awk, but pulling the data between each match and placing it onto seperate lines each is where I’m falling short.

input.txt contains data about network gear and any attached/child devices. It looks something like this:

DeviceDetail [baseProductId=router-5000, cardDetail=[CardDetail [baseCardId=router-5000NIC1, cardDescription=Router 5000 NIC, cardSerial=5000NIC1], CardDetail [baseCardId=router-5000NIC2, cardDescription=Router 5000 NIC, cardSerial=5000NIC2]], deviceSerial=5000PRIMARY, deviceDescription=Router 5000 Base Model]
DeviceDetail [baseProductId=router-100, cardDetail=[CardDetail [baseCardId=router-100NIC1, cardDescription=Router 100 NIC, cardSerial=100NIC1], CardDetail [baseCardId=router-100NIC2, cardDescription=Router 100 NIC, cardSerial=100NIC2]], deviceSerial=100PRIMARY, deviceDescription=Router 100 Base Model]

* UPDATE: I forgot to mention in the initial post that I also need the device’s PARENT serials (deviceSerial) listed with them as well. *

What I would like the output.txt to look like is something like this:

"router-5000NIC1","Router 5000 NIC","5000NIC1","5000PRIMARY"
"router-5000NIC2","Router 5000 NIC","5000NIC2","5000PRIMARY"
"router-100NIC1","Router 100 NIC","100NIC1","100PRIMARY"
"router-100NIC2","Router 100 NIC","100NIC2","100PRIMARY"

The number of occurrences of CardDetail on a single line could vary between 0 to hundreds depending on the device. I need to be able to extract all of the data by field between each occurrence of CardDetail and the next occurrence of ] and transport them to their own line in a CSV format.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T15:39:17+00:00

Here is an example that uses regular expressions. If there are minor variations in the text format, this will handle them. Also this collects all the values in an array; you could then do further processing (sort values, remove duplicates, etc.) if you wish.

#!/usr/bin/awk -f

BEGIN {
    i_result = 0
    DQUOTE = "\""
}

{
    line = $0
    for (;;)
    {
        i = match(line, /CardDetail \[ **([^]]*) *\]/, a)
        if (0 == i)
            break
        # a[1] has the text from the parentheses
        s = a[1]
        # replace from this: a, b, c   to this:  "a","b","c"
        gsub(/ *, */, "\",\"", s)
        s = DQUOTE s DQUOTE

        results[i_result++] = s
        line = substr(line, RSTART + RLENGTH - 1)
    }
}

END {
    for (i = 0; i < i_result; ++i)
        print results[i]
}

P.S. Just for fun I made a Python version.

#!/usr/bin/python

import re
import sys

DQUOTE = "\""

pat_card = re.compile("CardDetail \[ *([^]]*) *\]")
pat_comma = re.compile(" *, *")

results = []

def collect_cards(line, results):
    while True:
        m = re.search(pat_card, line)
        if not m:
            return
        len_matched = len(m.group(0))
        s = m.group(1)
        s = DQUOTE + re.sub(pat_comma, '","', s) + DQUOTE
        results.append(s)
        line = line[len_matched:]

if __name__ == "__main__":
    for line in sys.stdin:
        collect_cards(line, results)

    for card in results:
        print card

EDIT: Here’s a new version that also looks for “deviceID” and puts the matched text as the first field.

In AWK you concatenate strings just by putting them next to each other in an expression; there is an implicit concatenation operator when two strings are side by side. So this gets the deviceID text into a variable called s0, using concatenation to put double quotes around it; then later uses concatenation to put s0 at the start of the matched string.

#!/usr/bin/awk -f

BEGIN {
    i_result = 0
    DQUOTE = "\""
    COMMA = ","
}

{
    line = $0
    for (;;)
    {
        i = match(line, /deviceID=([A-Za-z_0-9]*),/, a)
        s0 = DQUOTE a[1] DQUOTE
        i = match(line, /CardDetail \[ **([^]]*) *\]/, a)
        if (0 == i)
            break
        # a[1] has the text from the parentheses
        s = a[1]
        # replace from this: foo=a, bar=b, other=c   to this:  "a","b","c"
        gsub(/[A-Za-z_][^=,]*=/, "", s)
        # replace from this: a, b, c   to this:  "a","b","c"
        gsub(/ *, */, "\",\"", s)
        s = s0 COMMA DQUOTE s DQUOTE

        results[i_result++] = s
        line = substr(line, RSTART + RLENGTH - 1)
    }
}

END {
    for (i = 0; i < i_result; ++i)
        print results[i]
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am attempting to extract data between the nth occurrence of 2 patterns. Pattern

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply