Trying to write a python script to extract lines from a file. The file

Question

0

Asked: May 25, 20262026-05-25T14:57:37+00:00 2026-05-25T14:57:37+00:00

Trying to write a python script to extract lines from a file. The file

0

Trying to write a python script to extract lines from a file. The file is a text file which is a dump of python suds output.

I want to:

strip all characters except words and numbers. I don’t want any “\n”, “[“, “]”, “{“, “=”, etc characters.
find a section where it starts with “ArrayOf_xsd_string”
remove the next line “item[] =” from the result
grab the remaining 6 lines and create a dictionary based on the unique number on the fifth line (123456, 234567, 345678) using this number as the key and the remaining lines as the values (pardon my ignorance if I’m not explaining this in pythonic terminology)
output the results to a file

Data in file is a list:

[(ArrayOf_xsd_string){
   item[] = 
      "001",
      "ABCD",
      "1234",
      "wordy type stuff",
      "123456",
      "more stuff, etc",
 }, (ArrayOf_xsd_string){
   item[] = 
      "002",
      "ABCD",
      "1234",
      "wordy type stuff",
      "234567",
      "more stuff, etc",
 }, (ArrayOf_xsd_string){
   item[] = 
      "003",
      "ABCD",
      "1234",
      "wordy type stuff",
      "345678",
      "more stuff, etc",
 }]

I tried doing a re.compile and here is my poor attempt at the code:

import re, string

f = open('data.txt', 'rb')
linelist = []
for line in f:
  line = re.compile('[\W_]+')
 line.sub('', string.printable)
 linelist.append(line)
 print linelist

newlines = []
for line in linelist:
    mylines = line.split()
    if re.search(r'\w+', 'ArrayOf_xsd_string'):
      newlines.append([next(linelist) for _ in range(6)])
      print newlines

I’m a Python newbie and haven’t found any results in google or on stackoverflow for how to extract specific number of lines after finding specific text. Any help is most appreciated.

Please ignore my code as I am taking “shots in the dark” 🙂

Here is what I’d like to see as the results:

123456: 001,ABCD,1234,wordy type stuff,more stuff etc
234567: 002,ABCD,1234,wordy type stuff,more stuff etc
345678: 003,ABCD,1234,wordy type stuff,more stuff etc

I hope that helps with trying to interpret my flawed code.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T14:57:38+00:00

Several suggestions on your code:

Stripping all non-alphanumeric characters is totally unnecessary and timewasting; there is no need whatsoever to build linelist. Are you aware you can simply use plain old string.find("ArrayOf_xsd_string") or re.search(...)?

strip all characters except words and numbers. I don’t want any “\n”, “[“, “]”, “{“, “=”, etc characters.
find a section where it starts with “ArrayOf_xsd_string”
remove the next line “item[] =” from the result

Then as to your regex, _ is already covered under \W anyway. But the following reassignment to line overwrites the line you just read??

for line in f:
  line = re.compile('[\W_]+') # overwrites the line you just read??
  line.sub('', string.printable)

Here’s my version, which reads the file directly, and also handles multiple matches:

with open('data.txt', 'r') as f:
    theDict = {}
    found = -1
    for (lineno,line) in enumerate(f):
        if found < 0:
            if line.find('ArrayOf_xsd_string')>=0:
                found = lineno
                entries = []
            continue
        # Grab following 6 lines...
        if 2 <= (lineno-found) <= 6+1:
            entry = line.strip(' ""{}[]=:,')
            entries.append(entry)
        #then create a dict with the key from line 5
        if (lineno-found) == 6+1:
            key = entries.pop(4)
            theDict[key] = entries
            print key, ','.join(entries) # comma-separated, no quotes
            #break # if you want to end on first match
            found = -1 # to process multiple matches

And the output is exactly what you wanted (that’s what ‘,’.join(entries) was for):

123456 001,ABCD,1234,wordy type stuff,more stuff, etc
234567 002,ABCD,1234,wordy type stuff,more stuff, etc
345678 003,ABCD,1234,wordy type stuff,more stuff, etc

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Trying to write a python script to extract lines from a file. The file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply