We need to store a long UTF-8 string into a MySQL database which has to be checked for uniqueness. This is the current configuration:
@Column(unique = true,length = 8000,columnDefinition="TEXT")
private String text;
but since MySQL needs an index prefix to be specified for both BLOB and TEXT fields, this fails with the following error:
BLOB/TEXT column 'path' used in key specification without a key length
How can I properly configure my ORM mapping to support such use case?
Have you considered calculating a hash value for your text string? Then you could store the hash value and just check for uniqueness on the hash value. Any collisions you get with the hash value you then check the actual text strings. If they are different then you include a sequence value. if not you’ve found your error. So your table is
Hash, Sequence (unique within the same Hash value), TextString
and your unique index is
Hash, Sequence
To test for uniqueness calculate the Hash value and attempt to store it with a zero sequence. If you can’t store it at sequence zero then compare the text string at sequence zero. If they are the same you found a duplicate text. If they are different attempt to store at sequence 1. Repeat until you find a duplicate text string at that sequence number or you don’t fail on storing it in the database with the next available sequence number.
The trick is figuring out a hash algorithm that doesn’t give you very many duplicates and can handle a long text string. Even better would be a great ORM that could do this for you.