Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6136641
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T17:37:41+00:00 2026-05-23T17:37:41+00:00

I am importing millions of rows from a tab file and SQLite .import .mode

  • 0

I am importing millions of rows from a tab file and SQLite .import .mode tabs is very slow. I have three indexes so probably the slowness comes from the indexing. But first I would like to check that .import add the rows grouping lots/all of them into a single commit. I was unable to find documentation how .import works. Do someone knows?. If the index is the problem (I had that issue before with mysql) how can I disable it and reindex at the end of the .import?

[Update 1]

Following @sixfeetsix comment.

My schema is:

CREATE TABLE ensembl_vf_b36 (
        variation_name  varchar(20),
        chr     varchar(4),
        start   integer,
        end     integer,
        strand  varchar(5),
        allele_string    varchar(3),
        map_weight      varchar(2),
        flags           varchar(50),
        validation_status       varchar(100),
        consequence_type        varchar(50)
);
CREATE INDEX pos_vf_b36_idx on ensembl_vf_b36 (chr, start, end);

data:

rs35701516      NT_113875       352     352     1       G/A     2       NULL    NULL    INTERGENIC
rs12090193      NT_113875       566     566     1       G/A     2       NULL    NULL    INTERGENIC
rs35448845      NT_113875       758     758     1       A/C     2       NULL    NULL    INTERGENIC
rs17274850      NT_113875       1758    1758    1       G/A     2       genotyped       cluster,freq    INTERGENIC

There are 15_608_032 entries in this table

And these are the stats

 $  time sqlite3 -separator '   ' test_import.db '.import variations_build_36_ens-54.tab ensembl_vf_b36'

real    29m27.643s
user    4m14.176s
sys     0m15.204s

[Update 2]

@sixfeetsix has a good answer and if you are reading this, you would be also interested in

Faster bulk inserts in sqlite3?

Sqlite3: Disabling primary key index while inserting?

[update3] Solution from 30 min -> 4 min

Even with all the optimisations (see accepted answer) still takes almost 30 minutes but if the indexes are not used and added at the end then total time is 4 minutes:

-- importing without indexes:
       real    2m22.274s
       user    1m38.836s
       sys     0m4.850s

 -- adding indexes
     $  time sqlite3 ensembl-test-b36.db < add_indexes-b36.sql

     real    2m18.344s
     user    1m26.264s
     sys     0m6.422s
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T17:37:41+00:00Added an answer on May 23, 2026 at 5:37 pm

    I believe the slowness indeed comes from building the index as more and more records are added. Depending on the RAM you have, you can tell sqlite to use enough memory so that all this index building activity is done in memory (ie without all the I/O that would happen otherwise with less memory).

    For 15M records, I’d say you should set your cache size at 500000.

    You can also tell sqlite to keep its transaction journal in memory.

    Finally, you can set synchronous to OFF so sqlite never waits for writes to be committed to disk.

    Using this I was able to divide the time it takes to import 15M records by 5 (14 minutes down to 2.5) with records made of random GUIDs split in 5 columns, using the three middle columns as an index:

    b40c1c2f    912c    46c7    b7a0    3a7d8da724c1
    9c1cdf2e    e2bc    4c60    b29d    e0a390abfd26
    b9691a9b    b0db    4f33    a066    43cb4f7cf873
    01a360aa    9e2e    4643    ba1f    2aae3fd013a6
    f1391f8b    f32c    45f0    b137    b99e6c299528
    

    So to try this I suggest you put all the instructions in some file, say import_test:

    pragma journal_mode=memory;
    pragma synchronous=0;
    pragma cache_size=500000;
    .mode tabs
    .import variations_build_36_ens-54.tab ensembl_vf_b36
    

    Then try it:

    time sqlite3 test_import.db < import_test
    

    EDIT

    This is a reply to the Pablo’s (the OP) comments following this answer (it’s to long to fit as a comment):
    My (educated) guesses are that:

    1. Because .import is not sql per se,
      it hasn’t much ado with
      transactions, I’m even inclined to
      think that it is written to go
      faster than even if you had all this
      done in one “normal” transaction;
      and,
    2. If you have enough memory to
      allocate, and you set up your
      environment as I suggest, the real
      (time) hog here is reading the flat
      file, then writing the final content
      of the database, because what
      happens in between happens extremely
      fast; i.e. fast enough that there
      ain’t much time to gain by optimizing it
      when you compare such potential
      gains with the (probably) non-compressible time spent on disk I/O.

    If I’m wrong though I’d be glad to hear why for my own benefit.

    EDIT 2

    I did a comparison test between having the index in place during .import, and having it added immediately after .import finishes. I used the same technique of generating a 15M record made of split random UUIDs:

    import csv, uuid
    w = csv.writer(open('bla.tab', 'wb'), dialect='excel-tab')
    for i in xrange(15000000):
        w.writerow(str(uuid.uuid4()).split('-'))
    

    Then I tested importing with the index created before and after (here the index is created after):

    pragma journal_mode=memory;
    pragma synchronous=0;
    pragma cache_size=500000;
    create table test (f1 text, f2 text, f3 text, f4 text, f5 text);
    CREATE INDEX test_idx on test (f2, f3, f4);
    .mode tabs
    .import bla.tab test
    

    So here is the time when adding the index before:

    [someone@somewhere ~]$ time sqlite3 test_speed.sqlite < import_test 
    memory
    
    real   2m58.839s
    user   2m21.411s
    sys    0m6.086s
    

    And when the index is added after:

    [someone@somewhere ~]$ time sqlite3 test_speed.sqlite < import_test 
    memory
    
    real   2m19.261s
    user   2m12.531s
    sys    0m4.403s
    

    You see how the “user” times difference (~9s) don’t account for the full times difference (~40s)? I To me this means that there is some extra I/O happening when the index is created before, and so I was wrong to think that all was being done in memory with no extra I/O.

    Conclusion: create the index after and you’ll have even better import times (just as Donal mentioned).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.