I’m loading a big dataset, about 9Gb into a “:memory:” database using SQLite. Right now, I have a python script that loads the data from .csv files (doing some other stuff on a line by line basis). I then need to do some joins on two big tables. Will indexing the columns I’m joining on make a difference if the database is in-memory? If so, is it better to create the table with an index before loading data into it, or build an index just before the join? And finally, will it make a difference speed-wise if I specify types for the columns?
I’m loading a big dataset, about 9Gb into a :memory: database using SQLite. Right
Share
Indexing will help: searching through all records of a table
is still slower than looking up in an index with a binary search.
If you create the index before loading the data,
the index has to be updated for every inserted record.
In SQLite, column types do not matter except for
type affinity.
Having no column types declared will be as fast as having correct types;
you would get the added expense of type conversions only if you managed
to declare wrong types for the columns.