Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6572071
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T14:57:37+00:00 2026-05-25T14:57:37+00:00

Trying to write a python script to extract lines from a file. The file

  • 0

Trying to write a python script to extract lines from a file. The file is a text file which is a dump of python suds output.

I want to:

  1. strip all characters except words and numbers. I don’t want any “\n”, “[“, “]”, “{“, “=”, etc characters.
  2. find a section where it starts with “ArrayOf_xsd_string”
  3. remove the next line “item[] =” from the result
  4. grab the remaining 6 lines and create a dictionary based on the unique number on the fifth line (123456, 234567, 345678) using this number as the key and the remaining lines as the values (pardon my ignorance if I’m not explaining this in pythonic terminology)
  5. output the results to a file

Data in file is a list:

[(ArrayOf_xsd_string){
   item[] = 
      "001",
      "ABCD",
      "1234",
      "wordy type stuff",
      "123456",
      "more stuff, etc",
 }, (ArrayOf_xsd_string){
   item[] = 
      "002",
      "ABCD",
      "1234",
      "wordy type stuff",
      "234567",
      "more stuff, etc",
 }, (ArrayOf_xsd_string){
   item[] = 
      "003",
      "ABCD",
      "1234",
      "wordy type stuff",
      "345678",
      "more stuff, etc",
 }]

I tried doing a re.compile and here is my poor attempt at the code:

import re, string

f = open('data.txt', 'rb')
linelist = []
for line in f:
  line = re.compile('[\W_]+')
 line.sub('', string.printable)
 linelist.append(line)
 print linelist

newlines = []
for line in linelist:
    mylines = line.split()
    if re.search(r'\w+', 'ArrayOf_xsd_string'):
      newlines.append([next(linelist) for _ in range(6)])
      print newlines

I’m a Python newbie and haven’t found any results in google or on stackoverflow for how to extract specific number of lines after finding specific text. Any help is most appreciated.

Please ignore my code as I am taking “shots in the dark” 🙂

Here is what I’d like to see as the results:

123456: 001,ABCD,1234,wordy type stuff,more stuff etc
234567: 002,ABCD,1234,wordy type stuff,more stuff etc
345678: 003,ABCD,1234,wordy type stuff,more stuff etc

I hope that helps with trying to interpret my flawed code.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T14:57:38+00:00Added an answer on May 25, 2026 at 2:57 pm

    Several suggestions on your code:

    Stripping all non-alphanumeric characters is totally unnecessary and timewasting; there is no need whatsoever to build linelist. Are you aware you can simply use plain old string.find("ArrayOf_xsd_string") or re.search(...)?

    1. strip all characters except words and numbers. I don’t want any “\n”, “[“, “]”, “{“, “=”, etc characters.
    2. find a section where it starts with “ArrayOf_xsd_string”
    3. remove the next line “item[] =” from the result

    Then as to your regex, _ is already covered under \W anyway. But the following reassignment to line overwrites the line you just read??

    for line in f:
      line = re.compile('[\W_]+') # overwrites the line you just read??
      line.sub('', string.printable)
    

    Here’s my version, which reads the file directly, and also handles multiple matches:

    with open('data.txt', 'r') as f:
        theDict = {}
        found = -1
        for (lineno,line) in enumerate(f):
            if found < 0:
                if line.find('ArrayOf_xsd_string')>=0:
                    found = lineno
                    entries = []
                continue
            # Grab following 6 lines...
            if 2 <= (lineno-found) <= 6+1:
                entry = line.strip(' ""{}[]=:,')
                entries.append(entry)
            #then create a dict with the key from line 5
            if (lineno-found) == 6+1:
                key = entries.pop(4)
                theDict[key] = entries
                print key, ','.join(entries) # comma-separated, no quotes
                #break # if you want to end on first match
                found = -1 # to process multiple matches
    

    And the output is exactly what you wanted (that’s what ‘,’.join(entries) was for):

    123456 001,ABCD,1234,wordy type stuff,more stuff, etc
    234567 002,ABCD,1234,wordy type stuff,more stuff, etc
    345678 003,ABCD,1234,wordy type stuff,more stuff, etc
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am new to PYTHON and trying to write script which replaces text in
I'm trying to write a python script that does the following from within a
I'm trying to write a python script which follows the common unix command line
I'm trying to extract text from arbitrary html pages. Some of the pages (which
I'm trying to write a script in python or bash which executes bulkloader.py to
I am trying to write a threaded Python script which will iterate through a
I want to write a Python script that runs another program, reading the output
I'm trying to write a small Python script to parse the .strings file in
I'm trying to write a Python script to test the output of some various
I'm trying to write a python script that packages our software. This script needs

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.