I can in my situation either partition by range or by list for my new website, below is an example of both which I can choose from,
partition BY LIST(`category`)(
PARTITION p0 VALUES IN(0),
PARTITION p1 VALUES IN(1),
PARTITION p2 VALUES IN(2),
PARTITION p3 VALUES IN(3),
PARTITION p4 VALUES IN(4),
PARTITION p5 VALUES IN(5),
PARTITION p6 VALUES IN(6)
);
Or I can do,
PARTITION BY RANGE (`category`)(
PARTITION p0 VALUES LESS THAN (1),
PARTITION p1 VALUES LESS THAN (2),
PARTITION p2 VALUES LESS THAN (3),
PARTITION p3 VALUES LESS THAN (4),
PARTITION p4 VALUES LESS THAN (5),
PARTITION p5 VALUES LESS THAN (6),
PARTITION p6 VALUES LESS THAN (7)
);
As I do not know anything about partitioning which one would be better to choose from?
Also I found out I can do something like this also,
partition BY HASH(`category`)
partitions 7;
Is there any better one to choose from or does it really not matter?
As a final question – I have 7 different categories. Choosing the last option what would happen if I was to use more than 7 partitions? Would this just be silly to do, or would it increase performance.
Thanks for taking the time to read my question!
You have 4 partitining strategies available in MySQL: RANGE, LIST, KEY AND HASH.
If you need to partition by specific values of the partitioning column, RANGE and LIST are designed for this purppose. If you don’t need to assign rows to files by specific values of the column, then KEY and HASH are more suitable for you.
In your case, all the three approaches (LIST, RANGE and HASH) work exactly the same way, as you have 7 partitions and rows with `category` = 0 will be stored in p0, `category` = 1 in p1 and so on.
In a HASH partitioning, to determine which file will be used to store certain row, MySQL uses the modulo (mod) operation:
partition_number = hash(partitioning_column) mod number_of_partitionsIn your HASH case, partition number would be:
partition_number = category mod 7(you are not using any hash function, so identity is used by default).Creating more (or less) partitions than values can have the partitining column is not necessarily useless but if you are using identity as hash function, then creating more partitions is useless.
Imagine you have only 7 categories but you define 20 partitions:
As 0 mod 7 = 0, 1 mod 7 = 1, 2 mod 7 = 2, 3 mod 7 = 3, 4 mod 7 = 4, 5 mod 7 = 5 and 6 mod 7 = 6, the rest of partitions (p7-p19) will never be used. It doesn’t increases or decreases the performance, it’s just useless.
Now imagine you still have your 7 categories, but you only have 4 partitions:
0 mod 4 = 0, 1 mod 4 = 1, 2 mod 4 = 2, 3 mod 4 = 3, 4 mod 4 = 0, 5 mod 4 = 1, 6 mod 4 = 2.
What happened? that the file p0 will contain rows with `category` = 0 and `category` = 4, p1 will contain rows with `category` = 1 and `category` = 5, p2 will contain rows with `category` = 2 and `category` = 6 and p3 will contain only rows with `category` = 3.
Would this increase or decrease the performance? it depends on the SELECT queries that you run on this table. If you run queries like this:
it’s perfect as only one partition will be accessed. But this other example is the opposite:
all the partitions will need to be accessed to retrieve the data.
MySQL gives you this info if you type:
Regarding which alternative to use, as LIST and RANGE partitions are intended to specify a list or a range of values and you have only one value per partition, I would go for HASH partitioning in this case.