I’m trying to create an sqlite database by importing a csv file with urls. The file has about 6 million strings. Here are the commands I’ve used
create table urltable (url text primary key);
.import csvfile urldatabase
After about 3 million urls the speed slows down a lot and my hard disk keeps spinning continuously. I’ve tried splitting the csv file into 1/4th chunks but I run into the same problem.
I read similar posts on stackoverflow and tried using BEGIN...COMMIT blocks and PRAGMA synchronous=OFF but none of them helped. The only way I was able to create the database was by removing the primary key constraint from url. But then, when I run a select command to find a particular url, it takes 2-3 seconds which won’t work for my application.
With the primary key set on url, the select is instantaneous. Please advise me what am I doing wrong.
[Edit]
Summary of suggestions that helped :
- Reduce the number of transactions
- Increase page size & cache size
- Add the index later
- Remove redundancy from url
Still, with a primary index, the database size is more than double the original csv file that I was trying to import. Any way to reduce that?
A
PRIMARY KEYorUNIQUEconstraint will automatically generate an index. An index will dramatically speed upSELECTs, at the expense of slowing downINSERTs.Try importing your data into a non-indexed table, and then explicitly
CREATE UNIQUE INDEX _index_name ON urltable(url). It may be faster to build the index all at once than one row at a time.