I read the MS Word document with $text = fread($filename, $filesize);
then when I echo the $text it has some chars that browser cannot display properly and outputs some broken chars. I’m trying to clear them out with following regex:
preg_replace('/[^\w]/','',$text); but it’s not working as I want.
Can anybody help, please?
As already mentioned in the comments, you should use a tool that transforms the .doc-file into something more usable like plain/text.
Otherwise you could try the following regexp when outputting each line, which only keeps digit-, word- and whitespace-charaters in the string: