I am using a TEXT column which is ut8_unicode_ci in mysql to store some data that is scraped from over the internet.
The texts that are gathered are from various sites in different languages.
I am getting confused with the max length of 65535 bytes for a TEXT column.
How can I check that the strings I am inserting into the column do not go over that limit?
At the minute I am using strlen($str) to check the length of the strings, but by using this does it make sure it that the data will not be truncated to fit into the column as I understand utf8_unicode_ci can be more than 1 byte per character?
EDIT: The OP can simply use
strlen()as it returns bytes, not characters. Witness:Credit goes to deceze in a comment to this post.
Old post below:
The notes of the PHP manual have a handy function for determining how many bytes are in a string. It seems to be the only alternative to using MYSQL built in functions such as
LENGTHto do the job, which would be cumbersome here.There are two other possible workarounds. Firstly, you can write the string to a file and check the file’s size. Secondly, you can force the ASCII encoding on mb_strlen and then it will treat each byte as a character, so the amount of characters that it returns is actually the amount of bytes. I haven’t tested this, so check it first. Let us know what works for you!