Short story:
I have an UTF NSString and a byte offset. I want to know the character at that byte offset. How can I do?
Below is the long story if you dare:
According this documentation offsets() function returns byte offset inside a column of a term. I have indexed some text and I use that offset to point to a particular section of the text when I display results.
Crucial problem is that using this byte offset I am unable to point the right location of the term. Sometimes it points correctly, sometimes it is 3/4 chars away from right point.
My table is very simple:
CREATE VIRTUAL TABLE t1 USING fts4(file, body, page);
If I do a query such as:
SELECT page, body, offsets(t1) from t1 where body match 'and';
I receive:
...........
502|1 0 427 3
505|1 0 370 3 1 0 1307 3 1 0 1768 3
506|1 0 10 3 1 0 1861 3 1 0 2521 3
...........
As an example if I point to char 427 of body I don’t get the right position of ‘and’ but I jump 2/3 chars away from it. The same if I go to 370 and if I go instead to 10 I get the right position.
Where am I wrong?
See the Sqlite FTS3 docs and you’ll notice that the offsets and lengths are in bytes not characters.
You must apply the offset and length before decoding the bytes into a string of characters in order to display the correct offset. The offset coming from Sqlite counts each byte of multibyte characters, whereas you are using that offset to count characters.
Your indexed text probably has 3 or 4 characters that are two bytes. Hence the off-by-3-or-4 problem.