I am working with html document generated from Micrsoft Word 2007/2010. Besides generating incredibly dirty html, word also has the tendency of using both block and inline style. I am looking for a php library would merge block into already existing inline style element.
Edit
The goal is to construct a html block preserve the original formatting and editable in WYSIWYG editor like tinyMCE
Example
If the original html is:
<html>
<head>
<style>
.normaltext {color:black;font-weight:normal;font-size:10pt}
.important {color:red;font-weight:bold;font-size:11pt}
</style>
<body>
<p class="normaltext" style="font-family:arial">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat.
<span class="important">Nam in purus nisi</span>, vitae dictum ligula.
Morbi mattis eros eget diam vulputate imperdiet.
<span class="important" style="color:green">Integer</span> a metus eros.
Sed iaculis porta imperdiet.
</p>
</body>
</html>
Should become:
<html>
<head>
<body>
<p style="font-family:arial;color:black;font-weight:normal;font-size:10pt">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat.
<span style="color:red;font-weight:bold;font-size:11pt">Nam in purus nisi</span>, vitae dictum ligula.
Morbi mattis eros eget diam vulputate imperdiet.
<span style="color:green;font-weight:bold;font-size:11pt">Integer</span> a metus eros.
Sed iaculis porta imperdiet.
</p>
</body>
</html>
I finally managed to get it to work. The code is based off of
http://blog.verkoyen.eu/blog/p/detail/convert-css-to-inline-styles-with-php
with once simple change:
Moving the line
up to the begining of the loop, right after where $properties is declared.
To make this work for WordPress however, one additional change is needed. DomDocument replace &nbps; from the document with blanks, which breaks WordPress update statement and lead to cotent being cut off. Please refer to my other question for the solution:
DOMDocument->saveHTML() converting to space
This problem is detailed in https://wordpress.stackexchange.com/questions/48692/post-content-getting-cut-off-on-blank-space-on-wpdb-update. If you know why this is happening for WordPress, please post your answer there as I would very much like to find out why it is happening.