In order to train the machine learning model I have to retrieve a sample of users which consists of balanced numbers of current users and former users. Tha database consists of tables all_users and former_users.
In case of unbalanced sample (100 records) the following query returns the records with desired columns:
SELECT t1.user_property1, t2.user_property2, t3.valid_to FROM additional_info t1 LEFT JOIN all_users t2 ON t1.user_ID = t2.user_ID LEFT JOIN former_users t3 ON t1.user_ID = t3.user_ID ORDER BY random() LIMIT 100
In order to get the balanced sample, there should be half records with users which are stored in table former_users and half from table all_users which, in the same time, are not in table former_users (otherwise the sample wouldn’t be balanced).
Does anyone know, what is the most convenient way to retrieve the balanced random sample from tables all_users and former_users along the additional properties from table additional_info?
Thank you!
Did the following:
but was looking for a better solution.