I’ve got an 7.4Gb csv file. After converting it to a sqlite database with

Question

0

Asked: June 4, 20262026-06-04T21:59:37+00:00 2026-06-04T21:59:37+00:00

I’ve got an 7.4Gb csv file. After converting it to a sqlite database with

0

I’ve got an 7.4Gb csv file. After converting it to a sqlite database with a python script the output DB is 4.7Gb, around 60% of the original size.

The csv has around 150,000,000 rows. It has header:

tkey,ipaddr,healthtime,numconnections,policystatus,activityflag

And each row looks something like

261846,172.10.28.15,2012-02-03 16:15:00,22,1,1

The script uses healthtime to split the data into tables 192 tables

When I first saw these numbers, I assumed I had made an error someplace. How big of a reduction in file size should I expect from the added efficiency of only writing the healthtime 192 times instead of 150,000,000 times?

EDIT: As soon as I posted this I realized the answer. I’m removing about 40% of the string, hence the 40% reduction in size.

Edit 2 Let’s calculate the difference in size between the plain text:

"261846,172.10.28.15,2012-02-03 16:15:00,22,1,1"

And the data base entry:

db(261846,'172.10.28.15',22,1,1)

First of all, we drop from 46 to 26 characters in plain text representation.

The remaining characters are:

"261846,172.10.28.15,22,1,1"

or 26 bytes. If each integer needs to be stored in 32 bit (4 bytes), then we have:

12 bytes (ipaddr) + 4 bytes * 4 (integer fields) = 28 bytes.

So it looks like converting to integers makes the storage slightly less efficient, and all of my gains come form reducing the number of characters stored in each row.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T21:59:38+00:00

Editorial Team

2026-06-04T21:59:38+00:00Added an answer on June 4, 2026 at 9:59 pm

SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000 instead of storing it as 8 bytes of text (or more if the file is unicode).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got an 7.4Gb csv file. After converting it to a sqlite database with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply