I’m new to SQL and I don’t understand performance implications. It seems like SQL

Question

0

Asked: June 4, 20262026-06-04T00:55:34+00:00 2026-06-04T00:55:34+00:00

I’m new to SQL and I don’t understand performance implications. It seems like SQL

0

I’m new to SQL and I don’t understand performance implications. It seems like SQL databases store everything in a single place. Doesn’t this mean that tables grow extremely large, very quickly? Won’t this hurt performance?

Example Stackoverflow model, but with threaded comments:

CREATE TABLE t_users (
   name varchar(80) primary key,
   email varchar(80)

);

CREATE TABLE t_posts (
   id varchar(80) primary key,
   userid varchar(80) references t_users(name),
   title varchar(80),
   description text,
   topic varchar(80),
   path text

);

Is this a valid design? All posts of every user ever are stored in the same table… So if I want to query for all comments that have topic “programming”, it would need to look through every single post, even through the posts that have different topics because they are all stored in the same table….this also means that if I make more complicated queries, they will exponentially grow slower the bigger my table on disk is.
Wouldn’t it be better to split every single post into a new table?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T00:55:36+00:00

The design is quasi-valid, but not completely:

Your t_users would be better of having an autoincrement unsigned
int ID column. _{(A primary key on a name is almost ALWAYS a bad idea. People change names. People have the same names. Even countries change names sometimes! A numeric is almost always the best choice!)}
That way, your t_posts can refer to that userID. Joins are now blazing fast.
You can even enforce the referential integrity: no post can be made by users that do not exist. Or when a user is deleted, the posts are deleted too.
Your t_posts has an ID primary key column (good!), but it’s varchar (bad!). INT is better. BIGINT if you need it.

You will find your posts later might have multiple topics (stackoverflow "tags"). Don’t put them CSV in a varchar field. Create a new table "topics" with ID, description, and a linking table "posts_to_topic" that links each post to one or more topics.

Indexes

What you need to read up on is indexes. If you want to query for all comments that have topic "programming", you’d usually have an index on the column "topic varchar(80)". This index is small (consider it a seperate table: it contains the indexed column(s) and the primary key), so your (R)DBMS can search it very quickly (tree-structure) and fetch all the primary keys it needs. Then, depending on what you select, the DBMS sends you the information:

if you only need columns already in the index, it can fetch everything from the index.
if you need other columns, it will only now actually access the t_posts table, and use the primary keys from the indexes to access the rows.

Simplification

I lied. In the last paragraph, I made it all much simpler that it really is. There is an optimiser that will look at the query and determine what indexes can be used. It will check the indexes – depending on the cardinality, table size, columns it might use it, or decide to scan the table anyway. If your table has variable row lengths, fetching the X-th row is much slower than when all rows have the same length (no VARCHAR). And all that depends on what (R)DBMS (or in MySQL, even on what storage engine) you use.

But read about indexes first, on the what, the why, and later the how. After that, you can study the exceptions deeper.

Multiple tables for the same data

This is a very frequently made beginner mistake, and they go both ways:

"We can put everything in one table, if we add a column "TYPE" and make everything varchar!"
"We can just have 10000 tables in our database, table_1, table_2, etc!"

Reading about indexes will tell you why this technically is a bad idea, but it’s also less elegant on a logical scale: one table is meant to represent one entity (Books. Users. Posts. Pages) – splitting those will result in some very ugly queries. And if you ask someone why they are doing this, the reason is often "for speed", while an extra index on their decision-column would have had the same effect.

Think about it: if you make a post title for each user, write the query that lists the 10 most used topics, and how many posts each of those has. You will have to name every table!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m new to SQL and I don’t understand performance implications. It seems like SQL

Leave an answerCancel reply

1 Answer

Indexes

Simplification

Multiple tables for the same data

Leave an answer
Cancel reply