Been trying to figure this one out all day. I have a large text

Question

0

Asked: May 23, 20262026-05-23T22:37:21+00:00 2026-05-23T22:37:21+00:00

Been trying to figure this one out all day. I have a large text

0

Been trying to figure this one out all day. I have a large text file (546 MB) that I am trying to parse in python looking to pull out the text between the open tag and the close tag and I keep getting memory problems. With the help of good folks on this board this is what I have so far.

answer = ''
output_file = open('/Users/Desktop/Poetrylist.txt','w')

with open('/Users/Desktop/2e.txt','r') as open_file:
    for each_line in open_file:
        if each_line.find('<A>'):
            start_position = each_line.find('<A>')
            start_position = start_position + 3
            end_position = each_line[start_position:].find('</W>')

            answer = each_line[start_position:end_position] + '\n'
            output_file.write(answer)

output_file.close()

I am getting this error message:

Traceback (most recent call last):
  File "C:\Users\Adam\Desktop\OEDsearch3.py", line 9, in <module>
    end_position = each_line[start_position:].find('</W>')
MemoryError

I have little to no programming experience and I am trying to figure this out for a poetry project I am working on. Any help is greatly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T22:37:21+00:00

Your logic is wrong because .find() returns -1 if the string is not found, and -1 is a true-ish value, so your code will think every line has <A> in it.
You don’t need to make a new substring to find the '</W>', because .find() also has an optional start argument.
Neither of these explain why you are running out of memory. Do you have an unusually small-memory machine?
Are you sure you’re showing us all the code?

EDITED: OK, now I think your file only has one line in it.

Try changing your code like this:

with open('/Users/Desktop/Poetrylist.txt','w') as output_file:
    with open('/Users/Desktop/2e.txt','r') as open_file:
        the_whole_file = open_file.read()
        start_position = 0
        while True:
            start_position = the_whole_file.find('<A>', start_position)
            if start_position < 0:
                break
            start_position += 3
            end_position = the_whole_file.find('</W>', start_position)
            output_file.write(the_whole_file[start_position:end_position])
            output_file.write("\n")    
            start_position = end_position + 4

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Been trying to figure this one out all day. I have a large text

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply