Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 773683
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T19:00:19+00:00 2026-05-14T19:00:19+00:00

I have a string buffer of a huge text file. I have to search

  • 0

I have a string buffer of a huge text file. I have to search a given words/phrases in the string buffer. Whats the efficient way to do it ?

I tried using re module matches. But As i have a huge text corpus that i have to search through. This is taking large amount of time.

Given a Dictionary of words and Phrases.

I iterate through the each file, read that into string , search all the words and phrases in the dictionary and increment the count in the dictionary if the keys are found.

One small optimization that we thought was to sort the dictionary of phrases/words with the max number of words to lowest. And then compare each word start position from the string buffer and compare the list of words. If one phrase is found, we don search for the other phrases (as it matched the longest phrase ,which is what we want)

Can some one suggest how to go about word by word in the string buffer. (Iterate string buffer word by word) ?

Also, Is there any other optimization that can be done on this ?

data = str(file_content)
for j in dictionary_entity.keys():
    cnt = data.count(j+" ")
    if cnt != -1:
        dictionary_entity[j] = dictionary_entity[j] + cnt
f.close()
  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T19:00:20+00:00Added an answer on May 14, 2026 at 7:00 pm

    Iterating word-by-word through the contents of a file (the Wizard of Oz from Project Gutenberg, in my case), three different ways:

    from __future__ import with_statement
    import time
    import re
    from cStringIO import StringIO
    
    def word_iter_std(filename):
        start = time.time()
        with open(filename) as f:
            for line in f:
                for word in line.split():
                    yield word
        print 'iter_std took %0.6f seconds' % (time.time() - start)
    
    def word_iter_re(filename):
        start = time.time()
        with open(filename) as f:
            txt = f.read()
        for word in re.finditer('\w+', txt):
            yield word
        print 'iter_re took %0.6f seconds' % (time.time() - start)
    
    def word_iter_stringio(filename):
        start = time.time()
        with open(filename) as f:
            io = StringIO(f.read())
        for line in io:
            for word in line.split():
                yield word
        print 'iter_io took %0.6f seconds' % (time.time() - start)
    
    woo = '/tmp/woo.txt'
    
    for word in word_iter_std(woo): pass
    for word in word_iter_re(woo): pass
    for word in word_iter_stringio(woo): pass
    

    Resulting in:

    % python /tmp/junk.py
    iter_std took 0.016321 seconds
    iter_re took 0.028345 seconds
    iter_io took 0.016230 seconds
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have string like this: G:\Projects\TestApp\TestWeb\Files\Upload\file.jpg How can I remove all text before Files
I have a huge string of text that is apparently raw data for a
Suppose I have a string char* str . I print it to the buffer
I have the following code: #include <string.h> int main(void) { char *buffer = NULL,
I have string like \LESSING\root\cimv2:Win32_UserAccount.Domain=LESSING,Name=Admin How to convert it to LESSING\Admin using Framework?
A legacy app program has a huge String Buffer (size sometimes upto an Mb)
I have an string input-buffer that contains html. That html contains a lot of
I have the following code populating a spinner, JSONObject jsonResponse = new JSONObject(new String(buffer));
I have an Array of Strings that was split from a buffer string. Now
I have to resolve a problem close to parsing a huge file like, 3

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.