Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 519065
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T08:00:08+00:00 2026-05-13T08:00:08+00:00

I’m looking for a way to convert text like this: <!DOCTYPE html PUBLIC \-//W3C//DTD

  • 0

I’m looking for a way to convert text like this:


"  <!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n  <html xml:lang=\"en\" lang=\"en\"
 xmlns=\"http://www.w3.org/1999/xhtml\">\n   \t<head>\n   \t\t<title>My Page 
Title</title>\n   \t\t<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=ISO-
8859-1\">\n      <style type=\"text/css\" media=\"screen\"> \n       \t\tblockquote\n
{\n \tfont-style: italic;\n }\n cite\n {\n
\ttext-align: right;\n \tfont-style: normal;\n }\n .author\n {\n \ttext-align: right;\n \tmargin-right: 80px;\n
}\n </style>\n \t</head>\n \t<body>\n \t\t<h1>My Page Title</h1>\n<h3>Production Manager</h3>\n<blockquote>\n<p>&#8220;I want my passion for business plan and my pride in my work to show in every step of our company: from the labels and papers, to our relationships with our customers, to the enjoyment of each bottle of My Company business plan. As we expand our production, my dream is to plant a company of my own to specialize in good business, my personal favorite varietal.&#8221;</p>\n</blockquote>\n<p class=\"author\"><cite>- John Smith</cite></p>\n<p>Born and raised on the north coast of California, John Smith always felt a deep connection to this......"

Into this:


My Page Title. Production Manager. I want my passion for business plan and my pride in my
work to show in every step of our company:  from the labels and papers, to our 
relationships with our customers, to the enjoyment of each bottle of My Company business 
plan.  As we expand our production, my dream is to plant a company of my own to specialize
in good business, my personal favorite varietal.

That’s just extracting all the text before the first period. But it must:

  • Strip HTML tags
  • Replace \n with “. ” (and multiple \n\n\n with “. “)
  • Replace \t with ” “
  • Replace \s+ with ” “
  • Unescape things like “
  • Replace ” with ‘

After starting to do something like that, I figured this is probably already solved somewhere else more thoroughly. Does anyone have a good one-liner way to create a plain text excerpt from an HTML string like this (in Ruby)?

I use Nokogiri for full-featured HTML parsing, but it seems like it’d be just as difficult to use that.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T08:00:08+00:00Added an answer on May 13, 2026 at 8:00 am

    Hmm. That seems like a rather lot of functionality for a one-liner. If you just want to parse and display an HTML page as plain text, I’d recommend using w3m.

    string = "..." # your string
    
    IO.popen("w3m -T text/html", "r+") do |pipe|
      pipe.write string
      pipe.close_write
      puts pipe.read
    end
    

    Gives me:

    My Page Title
    
    Production Manager
    
        “I want my passion for business plan and my pride in my work to show in
        every step of our company: from the labels and papers, to our relationships
        with our customers, to the enjoyment of each bottle of My Company business
        plan. As we expand our production, my dream is to plant a company of my own
        to specialize in good business, my personal favorite varietal.”
    
    - John Smith
    
    Born and raised on the north coast of California, John Smith always felt a deep
    connection to this......
    

    For the rest of the substitutions, I’d recommend applying a regexp replace either before or after processing, depending on your exact needs.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For some reason, after submitting a string like this Jack’s Spindle from a text
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have this code to decode numeric html entities to the UTF8 equivalent character.
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I have some data like this: 1 2 3 4 5 9 2 6
I'm working with an upstream system that sometimes sends me text destined for HTML/XML
I have a jquery bug and I've been looking for hours now, I can't
I would like to count the length of a string with PHP. The string
this is what i have right now Drawing an RSS feed into the php,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.