I’m trying to foresee how big my database will be. Let’s say I have

Question

0

Asked: May 10, 20262026-05-10T20:02:36+00:00 2026-05-10T20:02:36+00:00

I’m trying to foresee how big my database will be. Let’s say I have

0

I’m trying to foresee how big my database will be. Let’s say I have only one table:

CREATE TABLE user (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
email VARCHAR(50),
pass CHAR(40),
url VARCHAR(1000),
PRIMARY KEY(id));

Adding all up: 4 + 51 + 40 + 1001 = 1096 bytes in one record.
If I have 1 million records: 1,000,000 x 1096 bytes = 1,045 Mb!

So, it is one tiny table and I’m looking at 1 gig to store it. Am I correct in my estimation?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T20:02:36+00:00

Apart from the varchar issue you also need to be aware that most databases store the records in allocated blocks of storage (sometimes termed extents – although the precise terminology depends on the rdbms) which contain a certain amount of free space. The intention of this is to allow updates whilst minimizing table and index fragmentation. Of course allocated free space increases database file size even though there’s no actual data in it.

These overheads can generally be specified and controlled when creating the table using rdbms-specific clauses, and even virtually eliminated if it’s a read-only snapshot. OTOH you might want to make this padding larger than usual if your table is going to see a lot of IUD activity.

A good rule of thumb is to calculate you expected table size as you do – although guesstimating on varchar sizes as discussed in other posts (or better running analysis on sample data) then add 20% – a common default free space allocation. Actually in practice it’s unusual for free space allocation to cause an issue, especially if you deploy a sensible maintenance routine (so most folks never think about it), but failure to anticipate and make suitable allocation on a table hit by unusually high IUD activity can give rise to tricky to trace performance problems.

To be honest in these days of 600Gb disks as commonplace it’s a long time since I’ve seriously sized a database at any level other than a quick guesstimate.

*EDITED to reply to comment – ‘What is IUD and what do you mean by maintenance? Removing old records? – sneg’

IUD = Insert Update Delete activity. To illustrate the maintenance issue lets consider what would happen if we created a database with no free space and loaded a table like the one you propose with records containing varchar data. All records would be placed in our database file end to end, with no space between them.

If the user then updated the varchar part of the record there’s three possibilities. If the field is the same length then there’s no structural change. If it’s shorter we overwrite the old field and there’s a few spare bytes at the end of the field – no biggie. If however it’s longer then we have a problem – the record will no longer fit. In that case one solution would be to copy the whole amended record to a new location and update the indexes (and in some management schemes drop a pointer in where the old record was). The issue now is that a sequential read of the data – not an uncommon operation – will now have to jump around the database file rather than read straight through – a classic fragmentation scenario – and performance will gradually get hosed.

By allocating free space to the table then when we’re updating we have a certain amount of elbow room that allows us to change the record length without having to move the record from the page. Of course over time if the table sees a lot of activity it will still fragment (as we only allocate enough free space to cover some percentage of record changes in place), which is where maintenance comes in.

Maintenance in this case is essentially a defragmentation process to move records around so they are repositioned and free space allocated so that they are distributed efficiently again. In some (most) RDBMs you can just designate a maintenance plan and schedule a job to do this at q quiet time (SQL Server for instance) but in others you might have to do it manually – for example in older versions of Oracle the recommended approach was to export the data, drop the table and recreate, then re-import again from backup – the export/reload process would clean up the data as per any fresh load.

Index structures have similar issues.

I am of course glossing over a lot here, but the essential issues of storing variable length random access data records in a file will remain, no matter how many abstraction layers you pile on top of it. The good thing is that this sort of problem is well understood and most of the time it’s not something you need to worry about – until you ask an apparently simple question like ‘how much space will this table require’ 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to foresee how big my database will be. Let’s say I have

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply