What is the best way to convert user input to UTF-8?
I have a simple form where a user will pass in HTML, the HTML can be in any language and it can be in any character encoding format.
My question is:
-
Is it possible to represent everything as UTF-8?
-
What can I use to effectively convert any character encoding to UTF-8 so that I can parse it with PHP string functions and save it to my database and subsequently echo out using
htmlentities?
I am trying to work out how to best implement this – advice and links appreciated.
I am making use of Codeigniter and its input class to retrieve post data.
A few points I should make:
- I need to convert HTML special characters to their respective entities
- It might be a good idea to accept encoding and return it in that same encoding. However, my web app is making use of :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
This might have an adverse effect on things.
Specify
accept-charsetin your<form>tag to tell the browser to submit user-entered data encoded in UTF-8:See here for a complete guide on HOW TO Use UTF-8 Throughout Your Web Stack.