So I need to edit some text in a Word document. I created a Word document and saved it as XML. It is saved correctly (I can open the XML file in MS Word and it looks exactly like the docx original).
So then I use PHP DOM to edit some text in the file (just two lines) (EDIT – bellow is already fixed working version):
<?php
$firstName = 'Richard';
$lastName = 'Knop';
$xml = file_get_contents('template.xml');
$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->preserveWhiteSpace = false;
$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 't');
$c1 = 0; $c2 = 0;
foreach ($wts as $wt) {
if (1 === $c1) {
$wt->nodeValue .= ' ' . $firstName;
$c1++;
}
if (1 === $c2) {
$wt->nodeValue .= ' ' . $lastName;
$c2++;
}
if ('First Name' === substr($wt->nodeValue, 0, 10)) {
$c1++;
}
if ('Last Name' === substr($wt->nodeValue, 0, 9)) {
$c2++;
}
}
$xml = str_replace("\n", "\r\n", $xml);
$fp = fopen('final-xml.xml', 'w');
fwrite($fp, $xml);
fclose($fp);
This gets executed properly (no errors). These two lines:
<w:t>First Name:</w:t>
<w:t>Last Name:</w:t>
Get replaced with these:
<w:t>First Name: Richard</w:t>
<w:t>Last Name: Knop</w:t>
However, when I try to open the final-xml.xml file in MS Word, it doesn’t open (Word freezes). Any suggestions.
EDIT:
I tried using levenstein():
$xml = file_get_contents('template.xml');
$xml2 = file_get_contents('final-xml.xml');
$str = str_split($xml, 255);
$str2 = str_split($xml2, 255);
$i = 0;
foreach ($str as $s) {
$dist = levenshtein($s, $str2[$i]);
if (0 <> $dist) {
echo $dist, '<br />';
}
$i++;
}
Which outputted nothing.
Which is weird. When I open the final-xml.xml file in notepad, I can clearly see that those two lines have changed.
EDIT2:
Here is the template.xml file: http://uploading.com/files/61b2922b/template.xml/
This is a problem related to DOS vs UNIX line endings. Word 2007 does not tolerate a
\nline ending, it requires\r\nwhereas Word 2010 is more tolerant and accepts both versions.To fix the problem make sure that you replace all UNIX line breaks with DOS ones before saving the output file:
Full sample: