Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3670168
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T02:19:54+00:00 2026-05-19T02:19:54+00:00

I need to turn HTML into equivalent Markdown-structured text. OBS.: Quick and clear way

  • 0

I need to turn HTML into equivalent Markdown-structured text.

OBS.: Quick and clear way of doing this with PHP & Python.

As I am programming in PHP, some people indicates Markdownify to do the job, but unfortunately, the code is not being updated and in fact it is not working. At sourceforge.net/projects/markdownify there is a “NOTE: unsupported – do you want to maintain this project? contact me! Markdownify is a HTML to Markdown converter written in PHP. See it as the successor to html2text.php since it has better design, better performance and less corner cases.”

From what I could discover, I have only two good choices:

  • Python: Aaron Swartz’s html2text.py

  • Ruby: Singpolyma’s html2markdown.rb, based on Nokogiri

So, from PHP, I need to pass the HTML code, call the Ruby/Python Script and receive the output back.

(By the way, a folk made a similar question here (“how to call ruby script from php?”) but with no practical information to my case).

Following the Tin Man`s tip (bellow), I got to this:

PHP code:

$t='<p><b>Hello</b><i>world!</i></p>';
$scaped=preg_quote($t,"/");
$program='python html2md.py';

//exec($program.' '.$scaped,$n); print_r($n); exit; //Works!!!

$input=$t;

$descriptorspec=array(
   array('pipe','r'),//stdin is a pipe that the child will read from
   array('pipe','w'),//stdout is a pipe that the child will write to
   array('file','./error-output.txt','a')//stderr is a file to write to
);

$process=proc_open($program,$descriptorspec,$pipes);

if(is_resource($process)){
    fwrite($pipes[0],$input);
    fclose($pipes[0]);
    $r=stream_get_contents($pipes[1]);
    fclose($pipes[1]);
    $return_value=proc_close($process);
    echo "command returned $return_value\n";
    print_r($pipes);
    print_r($r);
}

Python code:

#! /usr/bin/env python
import html2text
import sys
print html2text.html2text(sys.argv[1])
#print "Hi!" #works!!!

With the above I am geting this:

command returned 1
Array
(
[0] => Resource id #17
1 => Resource id #18
)

And the “error-output.txt” file says:

Traceback (most recent call last):
File “html2md.py”, line 5, in
print html2text.html2text(sys.argv1)
IndexError: list index out of range

Any ideas???


Ruby code (still beeing analysed)

#!/usr/bin/env ruby
require_relative 'html2markdown'
puts HTML2Markdown.new("<h1>#{ ARGF.read }</h1>").to_s

Just for the records, I tryed before to use PHP’s most simple “exec()” but I got some problemas with some special characters very common to HTML language.

PHP code:

echo exec('./hi.rb');
echo exec('./hi.py');

Ruby code:

#!/usr/bin/ruby
puts "Hello World!"

Python code:

#!usr/bin/python
import sys
print sys.argv[1]

Both working fine. But when the string is a bit more complicated:

$h='<p><b>Hello</b><i>world!</i></p>';
echo exec("python hi.py $h");

It did not work at all.

That’s because the html string needed to have its special characters scaped. I got it using this:

$t='<p><b>Hello</b><i>world!</i></p>';
$scaped=preg_quote($t,"/");

Now it works like I said here.

I am runnig:
Fedora 14
ruby 1.8.7
Python 2.7
perl 5.12.2
PHP 5.3.4
nginx 0.8.53

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T02:19:55+00:00Added an answer on May 19, 2026 at 2:19 am

    Have PHP open the Ruby or Python script via proc_open, piping the HTML into STDIN in the script. The Ruby/Python script reads and processes the data and returns it via STDOUT back to the PHP script, then exits. This is a common way of doing things via popen-like functionality in Perl, Ruby or Python and is nice because it gives you access to STDERR in case something blows chunks and doesn’t require temp files, but it’s a bit more complex.

    Alternate ways of doing it could be writing the data from PHP to a temporary file, then using system, exec, or something similar to call the Ruby/Python script to open and process it, and print the output using their STDOUT.

    EDIT:

    See @Jonke’s answer for “Best practices with STDIN in Ruby?” for examples of how simple it is to read STDIN and write to STDOUT with Ruby. “How do you read from stdin in python” has some good samples for that language.

    This is a simple example showing how to call a Ruby script, passing a string to it via PHP’s STDIN pipe, and reading the Ruby script’s STDOUT:

    Save this as “test.php”:

    <?php
    $descriptorspec = array(
       0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
       1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
       2 => array("file", "./error-output.txt", "a") // stderr is a file to write to
    );
    $process = proc_open('ruby ./test.rb', $descriptorspec, $pipes);
    
    if (is_resource($process)) {
        // $pipes now looks like this:
        // 0 => writeable handle connected to child stdin
        // 1 => readable handle connected to child stdout
        // Any error output will be appended to /tmp/error-output.txt
    
        fwrite($pipes[0], 'hello world');
        fclose($pipes[0]);
    
        echo stream_get_contents($pipes[1]);
        fclose($pipes[1]);
    
        // It is important that you close any pipes before calling
        // proc_close in order to avoid a deadlock
        $return_value = proc_close($process);
    
        echo "command returned $return_value\n";
    }
    ?>
    

    Save this as “test.rb”:

    #!/usr/bin/env ruby
    
    puts "<b>#{ ARGF.read }</b>"
    

    Running the PHP script gives:

    Greg:Desktop greg$ php test.php 
    <b>hello world</b>
    command returned 0
    

    The PHP script is opening the Ruby interpreter which opens the Ruby script. PHP then sends “hello world” to it. Ruby wraps the received text in bold tags, and outputs it, which is captured by PHP, and then output. There are no temp files, nothing passed on the command-line, you could pass a LOT of data if need-be, and it would be pretty fast. Python or Perl could easily be used instead of Ruby.

    EDIT:

    If you have:

    HTML2Markdown.new('<h1>HTMLcode</h1>').to_s
    

    as sample code, then you could begin developing a Ruby solution with:

    #!/usr/bin/env ruby
    
    require_relative 'html2markdown'
    
    puts HTML2Markdown.new("<h1>#{ ARGF.read }</h1>").to_s
    

    assuming you’ve already downloaded the HTML2Markdown code and have it in the current directory and are running Ruby 1.9.2.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to know how to turn on Code Coverage when running TFS builds
Need a way to allow sorting except for last item with in a list.
Need a function that takes a character as a parameter and returns true if
Need to an expression that returns only things with an I followed by either
Need to locate the following pattern: The letter I followed by a space then
need ask you about some help. I have web app running in Net 2.0.
Need a function like: function isGoogleURL(url) { ... } that returns true iff URL
I need to know about Epoll On linux System. Could you recommend manual or
I need to copy hundreds of gigs of random files around on my computer
I need to send hundreds of newsletters, but would like to check first if

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.