I recently learned that overlong encodings cause a security risk when not properly validated. From the answer in the previously mentioned post:
For example the character < is usually represented as byte 0x3C, but
could also be represented using the overlong UTF-8 sequence 0xC0 0xBC
(or even more redundant 3- or 4-byte sequences).
And:
If you take this input and handle it in a Unicode-oblivious byte-based
tool, then any character processing step being used in that tool may
be evaded.
Meaning that if I use htmlspecialchars on a string that uses overlong encoding, then the output could still contain tags. I also assume that you could post similar characters (like " or ;) which could also be used for SQL injections.
Perhaps it is me, but I believe that this is a security risk relatively few people take into account and even know about. I’ve been coding for years and am only now finding this out.
Anyway, my question is: what tools can I use to send data with overlong encodings? People who are familiar with this risk: how do you perform tests on websites? I want to POST a bunch of overlong characters to my sites, but I have no idea how to do this.
In my situation I mostly use PHP and MySQL, but what I really want to know are testing tools, so I guess the back-end situation does not matter much.
To test if your site is vulnerable use curl to fets your page using post and the encoding to the utf8 long and post utf8 long encoded information(you could use your text editor for this by setting the text editor encoding to utf8 long so the text you post using curl and the php file is in long)
http://php.net/manual/en/function.curl-setopt.php