I have the following input. I want to parse it to a CSV delimited

Question

0

Asked: June 7, 20262026-06-07T18:51:01+00:00 2026-06-07T18:51:01+00:00

I have the following input. I want to parse it to a CSV delimited

0

I have the following input. I want to parse it to a CSV delimited string. I can get the SKUs through regex patterns, but as I am new to regex parsing, I don’t know complex patterns. It would be nice if anyone could help me with this.

Thanks!

    charset="iso-8859-1"


BODY {


}



TD {



}



TH {


}



H1 {


}

TABLE,IMG,A {


}

**PO Number:** 35102


**Ship To:**  


Georgie Clements



6902 Stonegate Drive

Odessa, TX 79765



432-363-8459


SKU



Product



Qty


JJ-Rug-Zebra-PK



Zebra Pink Rug



1

JJ-Zebra-PK-Twin-4



Zebra Pink 4 Piece Twin Comforter Set



1



JJ-TwinSheets-Zebra-PK



Zebra Pink 3 Piece Twin Sheet Set



1




JJ-Memo-Zebra-PK



Zebra Pink Memory Board



1

I want it to format like this:

PONumber, Shipping info, SKU, Product, Qty
'35102', '[ShipToAddress]', 'JJ-Rug-Zebra-PK', 'Zebra Pink Rug', '1'
'35102', '[ShipToAddress]', 'JJ-Zebra-PK-Twin-4', 'Zebra Pink 4 Piece Twin Comforter Set', '1'
'35102', '[ShipToAddress]', 'JJ-TwinSheets-Zebra-PK', 'Zebra Pink 3 Piece Twin Sheet Set', '1'
'35102', '[ShipToAddress]', 'JJ-Memo-Zebra-PK', 'Zebra Pink Memory Board', '1'

The current code is the following:

pattern = re.compile(r'(\b\w*JJ-\S*)') 

pos = 0 
    while True: 
        match = pattern.search(msgStr, pos) 
        if not match: 
            break 
        a = match.start() 
        e = match.end() 
        print ' %2d : %2d = %s' % (a, e-1, msgStr[a:e]) 
        pos = e

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T18:51:02+00:00

Here’s another solution, not using regular expressions:

s = "(your data as a single multiline string)"

datalines = lambda s: [ln for ln in (line.strip() for line in s.splitlines()) if ln]

_, _, po_number, _, rem = s.split('**')
shipto, data = rem.split('SKU', 1)

po_number = datalines(po_number)[0]
shipto    = '\n'.join(datalines(shipto))
data      = datalines(data)[2:]

res = [[po_number, shipto, sku, prod, qty] for sku,prod,qty in zip(*([iter(data)]*3))]

which gives the final result

[
    ['35102', 'Georgie Clements\n6902 Stonegate Drive\nOdessa, TX 79765\n432-363-8459', 'JJ-Rug-Zebra-PK', 'Zebra Pink Rug', '1'],
    ['35102', 'Georgie Clements\n6902 Stonegate Drive\nOdessa, TX 79765\n432-363-8459', 'JJ-Zebra-PK-Twin-4', 'Zebra Pink 4 Piece Twin Comforter Set', '1'],
    ['35102', 'Georgie Clements\n6902 Stonegate Drive\nOdessa, TX 79765\n432-363-8459', 'JJ-TwinSheets-Zebra-PK', 'Zebra Pink 3 Piece Twin Sheet Set', '1'],
    ['35102', 'Georgie Clements\n6902 Stonegate Drive\nOdessa, TX 79765\n432-363-8459', 'JJ-Memo-Zebra-PK', 'Zebra Pink Memory Board', '1']

Edit: second data file returns

[
    ['35104', 'Angelica Alvarado\n669 66th St.\nSpringfield, OR 97478\n5412322525', 'JJ-CribSheet-Cheetah-PK-PRT', 'Cheetah Pink Print Microsuede Crib Sheet', '1']
]

which on inspection appears to be correct?

Final Summary: I discovered that he was using html2text to convert the html email to text, then trying to parse it. The solution was to instead parse the html directly using BeautifulSoup, taking advantage of the page structure to identify the fields he wanted.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following input. I want to parse it to a CSV delimited

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply