I am using Snappy-java to encode JSON data and I want to store the result in database in a varchar column.
The database is an oracle database with ISO-8859-1 encoding.
I am facing an encoding problem when inserting the data. It would seem that some characters are not recognised by Oracle.
I’ve found a workaround by using Base64 encoding on the compressed data before inserting it. I can then retrieve it just fine 🙂
The problem with that is that Base64 encoding increases the length of the data that I am then storing, hereby reducing the savings gained with Snappy…
So my question is: How can I store that data without encoding it in Base64?
The reason I want to use a varchar is because I want to be able to access the table using an oracle index without ever accessing the table (performance is definitely an issue).
I have tried other compression algorithms as well, but they all seem to have the same problem.
I have also looked at yEnc but I cannot find any java encoder. Moreover I am not sure that I understands all the problems listed with yEnc, so I am bit reluctant using it.
Thanks a lot for any help!
Thank you all for your help!
I finally found a workaround.
Since I am storing bytes and not chars, I am going to use a BLOB to store the data.
The problem with the BLOB is that it cannot be indexed.
The alternative is using a RAW type column. It stores bytes and is indexable. Unfortunately it is too small (2000 bytes).
So, the answer in my case consist in storing the data in BLOB, and access it through an index on two RAW types since the data is never bigger than 4000 bytes.
The index looks like this:
where
substr_dt is a user defined deterministic function (defined hereafter)
CREATE OR REPLACE FUNCTION substr_dt(str BLOB, buffer_size int, offset int) RETURN RAW
DETERMINISTIC IS
BEGIN
RETURN dbms_lob.substr(str,buffer_size,offset);
END;
To access the data, I just need to query the product_id and the fields using aliases, e.g.
In this case, summary_1 represents the first 2000 bytes of the blob, and summary 2 the last 2000 bytes.
Using concatenation on the two arrays summary1 and summary2 I get the content of the blob.
That works with Jdbc but I could not make it work with Hibernate (yet).
It is not the best solution ever as data needs reprocessing before being interpreted. However, it solves the data access problem without the encoding space overhead.