This is a pretty complicated question so be prepared! I want to generate some test data in excel for my EAV table. The columns I have are:
user_id, attribute, value
Each user_id will repeat for a random number of times between 1-4, and for each entry I want to pick a random attribute from a list, and then a random value which this can take on. Lastly I want the attributes for each id entry to be unique i.e. I do not want more than one entry with the same id and attribute. Below is an example of what I mean:
user_id attribute value
100001 gender male
100001 religion jewish
100001 university imperial
100002 gender female
100002 course physics
Possible values:
attribute value
gender male
female
course maths
physics
chemistry
university imperial
cambridge
oxford
ucl
religion jewish
hindu
christian
muslim
Sorry that the table above messed up. I don’t know how to paste into here while retaining the structure! Hopefully you can see what I’m talking about otherwise I can get a screenshot.
How can I do this? In the past I have generated random data using a random number generator and a VLOOKUP but this is a bit out of my league.
My approach is to create a table with all four attributes for each ID and then filter that table randomly to get between one and four filtered rows per ID. I assigned a random value to each attribute. The basic setup looks like this:
To the left is the randomized eav table and to the left is the lookup table used for the randomized values. Here’s the formulas. Enter them and copy down:
Column A – Establishes a random number every four digits. This determines the attribute that must be selected:
Column B – Uses the formula in A to determine if row is included:
Column C – Creates the IDs, starting with 100,001:
Column D – Repeats the four attributes:
Column E – Finds the first occurence of the Column D attribute in the lookup table and selects a randomly offset value:
When you filter on the TRUEs in Column B you’ll get your list of one to four Attributes per ID. Disconcertingly, the filtering forces a recalculation, so the filtered list will no longer say TRUE for every cell in column B.
If this was mine I’d automate it a little more, perhaps by putting the “magic number” 4 in it’s own cell (the count of attributes).