Is there a benefit of using multiple columns on distribution when creating a table?

Question

0

Asked: June 11, 20262026-06-11T18:46:25+00:00 2026-06-11T18:46:25+00:00

Is there a benefit of using multiple columns on distribution when creating a table?

0

Is there a benefit of using multiple columns on distribution when creating a table? For instance:

CREATE TABLE data_facts (
    data_id int primary key,
    channel_id smallint,
    chart_id smallint,
    demo_id smallint,
    value numeric)
DISTRIBUTED BY (
    channel_id,
    chart_id,
    demo_id)

as there will be chance I need join data_facts with three different tables channel, chart and demo using channel_id, chart_id and demo_id respectively.

Specifically,

Should I always add distribution and include all id(s) that I’m using for joining in terms of efficiency?
If so, does the order of these id(s) matter?
How does this work on an architecture level? (optional)

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T18:46:26+00:00

It depends on how much you want to shard the database, and on how less records you want to distribute in each partition, I mean if you add more than one column in the distribution you will fragment a lot more the data into more partitions.

It also depends if you shard by modulo or hash …

However, in my opinion, if you have a multiple columns primary key and you want to shard by this primary key could have a sense distributing by multiple columns(with all the columns in the primary key) otherwise you should shard by a single column that in most cases is enough .

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Is there a benefit of using multiple columns on distribution when creating a table?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply