Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4039166
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T12:34:14+00:00 2026-05-20T12:34:14+00:00

I’m working with an unstructured plain text file. In addition to a lot of

  • 0

I’m working with an unstructured plain text file. In addition to a lot of clutter, the file includes blocks of texts that are separated from the rest of the text by empty lines.

How can I use PHP to extract all blocks of texts with more than 100 words?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T12:34:15+00:00Added an answer on May 20, 2026 at 12:34 pm

    Depending on how large the file is or could be gives different approaches.

    1. The simplest approach would be if you were dealing with small enough files that handling it all in memory was a feasible option. Then you could simply use a regular expression to split up all the chunks of text, then loop through and get all the chunks larger than 100 words.

    2. The safest I think would be to open the file and fetch lines one at a time until you reach an empty line. If the total words in that block are more than 100 then store the block. Then continue with the next block.

    Here’s an example:

    // Option 1
    $contents = file_get_contents($filename);
    $blocks = array();
    // Split the contents by 2 line breaks in a row, plus any extra ones.
    // i.e. 3 blank lines in a row will be treated the same as 1 blank line.
    foreach(preg_split('/\n\n\n*/m', $contents) as $block) {
        if (str_word_count($block, 0) > 100)
            $blocks[] = $block;
    }
    
    // Option 2 - longer but does not store the contents in memory.
    $blocks = array();
    
    $fp = fopen($filename, 'r');
    
    $block = '';
    while($line = fgets($fp)) {
        if (!ctype_space($line)) { // depends on your meaning of an empty line
            $block .= $line;
        }
        elseif ($block != '') {
            if (str_word_count($block, 0) > 100)
                $blocks[] = $block;
            $block = '';
        }
    }
    if (str_word_count($block, 0) > 100)
        $blocks[] = $block;
    $block = '';
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working with an upstream system that sometimes sends me text destined for HTML/XML
For some reason, after submitting a string like this Jack’s Spindle from a text
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I have a text area in my form which accepts all possible characters from
I know there's a lot of other questions out there that deal with this
I have a reasonable size flat file database of text documents mostly saved in
I have a bunch of posts stored in text files formatted in yaml/textile (from
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.