I am not sure whether this is the right place to ask this question.

Question

0

Asked: May 26, 20262026-05-26T18:56:35+00:00 2026-05-26T18:56:35+00:00

I am not sure whether this is the right place to ask this question.

0

I am not sure whether this is the right place to ask this question.
As this is more like a logic question.. but hey no harm in asking.
Suppose I have a huge list of data (customers)
and they all have a data_id
Now I want to select lets say split the data in ratio lets say 10:90 split.
Now rather than stating a condition that (example)

the sum of digits is even...go to bin 1
the sum of digits is odd.. go to bin 2
or sum of last three digits are x then go to bin 1
sum of last three digits is not x then go to bin 2

Now this might result in uneven data collection..sometimes it might be able to find the data.. more (which is fine) but sometimes it might not be able to find enough data

Is there a way (probabilistically speaking)
which says.. sample size is always greater than x%

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T18:56:36+00:00

Editorial Team

2026-05-26T18:56:36+00:00Added an answer on May 26, 2026 at 6:56 pm

You want to partition your data by a feature that is uniformly distributed. Hash functions are designed to have this property … so if you compute a hash of your customer ID, and then partition by the first n bits to get 2^n bins, each bin should have approximately the same number of items. (You can then select, say, 90% of your bins to get 90% of the data.) Hope this helps.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am not sure whether this is the right place to ask this question.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply