Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3271508
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T18:44:06+00:00 2026-05-17T18:44:06+00:00

For a blog like project, I want to get the first few paragraphs, headers,

  • 0

For a blog like project, I want to get the first few paragraphs, headers, lists or whatever within a range of characters from a markdown generated html fragment to display as a summary.

So if I have

<h1>hello world</h1>
<p>Lets say these are 100 chars</p>
<ul>
    <li>some bla bla, 40 chars</li>
</ul>
<p>some other text</p>

And assume, I want to summarize with text within the first 150 chars (does not have to be overly exact, I could just get the first 150 chars, including tags and go on with that, but probably would create some artifacts at the tail which could be more difficult to handle…), it should give me the h1, the p and the ul, but not the final p (which would be truncated). If the first element should have more than 150 chars, I would take the full first element.

How could I get this? Using XPath or a regex? I am a bit without ideas on that…

Edit

First I want to give a big THANK YOU to all of you who replied!

While I got really great answers in this thread, I actually found it much easier to plug in before the markdown interpreter hits in, take the first n textblocks separated by \r\n\r\n and just pass this on for md generation.

  class String
    def summarize_md length
        arr = self.split(/\r\n\r\n/)
        sum =""
        arr.each do |ea|
          break if sum.length + ea.length > length
          sum = sum+"#{ea}\r\n\r\n"
        end
        sum
      end
  end

while one probably could reduce this code to a one liner, it is still much simpler and cpu friendlier than any of the proposed solutions.
Anyway, since my question could be interpreted such as if the html was the starting point (and not the md text), I’ll just give the answer to the first guy… I hope that’s just…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T18:44:07+00:00Added an answer on May 17, 2026 at 6:44 pm

    Using XPath is the most robust and flexible. Here’s a sample app:

    require 'rubygems'
    require 'nokogiri'
    
    html = <<End
    <h1>hello world</h1>
    <p>Lets say these are 100 chars.......................................................................</p>
    <ul>
        <li>some bla bla, 40 chars</li>
    </ul>
    <p>some other text</p>
    End
    
    LIMIT = 150
    summary = ""
    
    doc = Nokogiri::HTML.parse(html)
    doc.xpath('//text()').each do |node|
      text = node.text
      break if summary.length + text.length >= LIMIT
      summary << text
    end
    
    puts summary
    puts summary.length
    

    The XPath //text() simply selects all the text nodes in the document. If you wanted to be more specific about which elements you were interested in, you can.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this /blog/post1/ how can i get the post1 from
I'm building a simple code-first MVC 3 blog like application. My model has three
I have a site, where I want to include a blog-like feature. I know
I want to carry some cookies value to different subdomains like blog.mydomain.com , profile.mydomain.com
I have a movie database Kind of like a blog and I want to
Lets say I want to add a few methods to ActionView::Helpers::FormBuilder like the examples
i want to add a dynamic configuration path (generated from pkg-config) to my project.
I'm trying to create an accordion like blog function, where at first you see
I have URL scheme for my blog like this: http://www.example.com/%YEAR%/%MONTH%/%CATEGORY%/%POST_TITLE%/ Now i want to
I've been working on my own django based blog (like everyone, I know) to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.