Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6594605
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T17:49:59+00:00 2026-05-25T17:49:59+00:00

Based on the suggestions I received in this forum, I am using the following

  • 0

Based on the suggestions I received in this forum, I am using the following code (example) to count strings.

phrase_words = ['red car', 'no lake', 'newjersey turnpike']
lines = ['i have a red car which i drove on newjersey', 'turnpike. when i took exit 39 there was no', 'lake. i drove my car on muddy roads which turned my red', 'car into brown. driving on newjersey turnpike can be confusing.']
text = " ".join(lines)
dict = {phrase: text.count(phrase) for phrase in phrase_words}

The desired output and the output of the example code is:

{'newjersey turnpike': 2, 'red car': 2, 'no lake': 1}

This code worked great on a text file which was less than 300MB. I used a text file of size 500MB + and received the following memory error:

    y=' '.join(lines)
MemoryError

How do I overcome this? Thanks for your help!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T17:49:59+00:00Added an answer on May 25, 2026 at 5:49 pm

    This algorithm needs only two lines in memory at a time. It assumes that no phrase will span three lines:

    from itertools import tee, izip
    from collections import defaultdict
    
    def pairwise(iterable): # recipe from itertools docs
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        next(b, None)
        return izip(a, b)
    d = defaultdict(int)
    phrase_words = ['red car', 'no lake', 'newjersey turnpike']
    lines = ['i have a red car which i drove on newjersey',
             'turnpike. when i took exit 39 there was no',
             'lake. i drove my car on muddy roads which turned my red',
             'car into brown. driving on newjersey turnpike can be confusing.']
    
    for line1, line2 in pairwise(lines):
        both_lines= ' '.join((line1, line2))
        for phrase in phrase_words:
            # counts phrases in first line and those that span to the next
            d[phrase] += both_lines.count(phrase) - line2.count(phrase)
    for phrase in phrase_words:
        d[phrase] += line2.count(phrase) # otherwise last line is not searched
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Based on the response to this question: Why does C++ have header files and
Based on this question it appears that the default template for CheckStyle will allow
Based on this question I don't want to litter my ready stuff waiting for
Based on the code I've found, it seems that the Visitor is required to
I am trying to create a custom datasource control. I have been following this
Based on a simple test I ran, I don't think it's possible to put
Based on a few posts I've read concerning version control, it seems people think
Based on all my reading there should be one GC thread to invoke all
Based on their work, how do you distinguish a great SQL developer? Examples might
Based on my previous question here my new concern is how do I unit

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.