I have 24,888 records that go into a SQLite table with FTS indexing. I load the table using .import from a set of .csv files with comma separators.
The platform is Linux – CentOS 6.2 with SQLite 3.6.20 as the supported version in this distribution.
This has all been working using FTS 3 to build a search system – with Apache and PHP.
To get FTS 4 functionality – with statistics I need – I recently manually updated to SQLite 3.7.13 – the newest.
Unfortunately, now when I import from .csv – whether to an FTS 3 or FTS 4 table I am missing over half the rows / records – I only get 12,224 instead of 24,888.
There are NO ERRORS reported during the import.
The last column in the data is a large (in some records) amount of text – up to 200K – 300K in a few cases. There are still some characters such as the curly double quotes or m-dash or n-dash in the text.
Is it possible that something is looking like end-of-file to the .immport code?
The word count utility ‘wc -l’ sees all 24,888 lines.
One final point, the .csv files were created using Python 2.6.6 running on the Linux system and any original line breaks were escaped (to ||) so I believe them to have uniform carriage control / line breaks. Maybe I’m missing some line-feed or carriage-return characters that SQLite 3.6 is ignoring but 3.7 is seeing???
I really need the additional FTS 4 capabilities (mostly to report statistics about search results using matchinfo) but it is no good if I cannot load all the data!
Thanks,
Laurence Brevard
San Jose, CA
[keywords2@keywords2c6x64 grant-import-tools]$ /usr/local/bin/sqlite3.7 --version
3.7.13 2012-06-11 02:05:22 f5b5a13f7394dc143aa136f1d4faba6839eaa6dc
[keywords2@keywords2c6x64 grant-import-tools]$ /usr/bin/sqlite3 --version
3.6.20
[keywords2@keywords2c6x64 grant-import-tools]$ uname -a
Linux keywords2c6x64 2.6.32-220.23.1.el6.x86_64 #1 SMP Mon Jun 18 18:58:52 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
[keywords2@keywords2c6x64 grant-import-tools]$ cat /etc/redhat-release
CentOS release 6.2 (Final)
I fixed mine.
Apparently double quote is now treated differently on comma separated files. I escaped them in the original source to two single quotes. After that all my lines import without a problem.
It appears that once there is a double quote in a field, everything until the next double quote is treated as a single field. I guess.
This is definitely different behavior starting after 3.7.7 and by 3.7.12.