Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7036643
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T01:26:55+00:00 2026-05-28T01:26:55+00:00

When copy/paste from Word I end up with a lot of unsafe characters. Instead

  • 0

When copy/paste from Word I end up with a lot of unsafe characters. Instead of find/replace each character individually I thought it would be useful to write a quick PHP script to do this.

When I hit submit with the sample HTML below each of the characters I would like to replace have been replace with a �. What am I doing wrong?

Am I right in thinking that if I use: htmlentities() or htmlspecialchars() this will replace the HTML markup?

Sample HTML block

<p>Nam ’velit metus, vulputate – eget sodales ut, dignissim “vehicula nisi”. Lor’em ipsum dolor sit amet, consectetur adipiscing elit. Nunc pharetra luctus mi, sollicitudin ultrices lacus iaculis sed. Nam aliquam, tortor id sodales scelerisque, est mauri’s adipiscing nunc, a tincidunt tortor elit eget quam. Fusce sagittis arcu ut urna egestas luctus. Aliquam erat volutpat. Suspendisse ut turpis mi. Nulla facilisi. Ut congue porta urna nec semper. Aenean feugiat ante vitae – dui accumsan placerat. Suspendisse aliquet, libero non tempor–  dignissim, arcu nibh luctus magna, eu pellentesq’ue libero eros nec magna. Phasellus non ullamcorper nisi. Aenean sagittis elit ac lorem imperdiet ac consequat sem commodo. Aenean in elit at lectus blandit varius nec in erat. Mauris elementum, turpis eu eleifend pora, quam purus tempor justo, et feugiat tellus mi sed erat.</p>
    <ul>
        <li><strong>’Pellentesque’</strong> nec leo cursus ipsum rhoncus volutpat nec eget mi.</li>
        <li><strong>N–am</strong> quis lectus enim, ac euismod urna.</li>
        <li><strong>Donec</strong> varius massa augue, at feugiat tortor.</li>
        <li><strong>“Duis”</strong> non massa eget elit euismod pulvinar.</li>
        <li><strong>Duis</strong> bibendum sodales lorem, vel commodo metus volutpat a.</li>
        <li><strong>Nu–nc</strong> pulvinar lacus in nisl dignissim euismod.</li>
        <li><strong>“Nulla”</strong> tincidunt nulla adipiscing ante aliquet mattis</li>
    </ul>


<?php     
/**
 *
 * @param string $unformatted
 * @return string
 */
function format($unformatted) {

    $html = strtolower(trim($unformatted));

    //replace accent characters, forien languages
    $search = array('à','á','â','ã','ä','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ù','ú','û','ü','ý','ÿ','À','Á','Â','Ã','Ä','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ñ','Ò','Ó','Ô','Õ','Ö','Ù','Ú','Û','Ü','Ý'); 
    $replace = array('a','a','a','a','a','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','u','u','u','u','y','y','A','A','A','A','A','C','E','E','E','E','I','I','I','I','N','O','O','O','O','O','U','U','U','U','Y'); 
    $html = str_replace($search, $replace, $html);

    //replace common characters
    $search = array('/(\s\&\s)/i', '/(\s\£\s)/i', '/(\s\$\s)/i'); 
    $replace = array('&amp;', '&pound;', '&dollar;'); 
    $html= preg_replace($search, $replace, $html);

    //replace MS office crap
    $search = array("‘", "’", "”", "“", "–", "…");
    $replace = array("'", "'", '"', '"', "-", "..."); 
    $html= str_replace($search, $replace, $html);

    return $html;
}

if(isset($_POST['clean'])){
    $html = format($_POST['html']);
} 

?>

<!doctype html>
<html>
<head>
    <meta charset="utf-8" />

    <title>HTML Tidy</title>

    <style type="text/css">
        body {
            color: #262626;
            background: #f4f4f4;
            font: normal 12px/18px Verdana, sans-serif;
            height: 100%;
        }
        #container {
            width: 760px;
            margin: 40px auto 0 auto;
            padding: 10px 60px;
            border: solid 1px #cbcbcb;
            background: #fafafa;
            -moz-box-shadow: 0px 0px 10px #cbcbcb;
            -webkit-box-shadow: 0px 0px 10px #cbcbcb;
        }
    </style>
</head>

<body>
    <div id="container" class="content">
        <h1>HTML Tidy</h1>

        <form action="" method="post">
            <textarea name="html" id="html" rows="20" cols="90"><?php if(isset($html)){ echo $html; } ?></textarea>

            <input type="submit" name="clean" value="Clean" />
        </form>
    </div>
</body>
</html>

Properties of file
encoding

Page headers
headers

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T01:26:56+00:00Added an answer on May 28, 2026 at 1:26 am

    htmlspecialchars does exactly what needs to be done about unsafe characters, which are < > & ' " and nothing else.

    Your problem seems to be that your PHP file is not saved in the encoding you’re using for your web page. In 2012 we can safely say you really should always use UTF-8 and nothing else. (Unless you are using UTF-16, of course).

    What happens then is a mess, involving PHP treating one multibyte character as multiple characters, replacing just a part of it and rendering it invalid. But even that isn’t unsafe. It’s just ugly and unreasoned.

    The answer by @webarto does indeed solve the problem you are trying to solve, but it’s the wrong problem in the first place.

    In the screenshot you posted, you should choose Other and select UTF-8, then find where the default encoding is set and set it to UTF-8, and use only UTF-8 from now on. Please.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Im trying to copy and paste a table from excel into a word document.
We run some large directories where users often copy/paste content from word documents etc
I'm unable to copy/paste items from my listbox to any document (Excel, Word, .txt).
When I copy and paste from a Word document into a QT TextEditor ,
When i copy paste table from word file, it doesn't allow me to add
I have users that require to copy and paste from word (or excel) into
i have such code (copy paste from wiki). Its multiplication of those big numbers
I copy and paste code from this URL for creating and reading/writing a proc
Frequently I copy and paste code from my existing code base. Eclipse frequently brings
I'm a PHP/MySQL developer who hasn't used any form of versioning aside from copy/paste

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.