I have a dataset that looks like this (Notice that a blank separates each product):
Client_ID Purchase
121212 "Orange_Juice Lettuce"
121212 "Banana Bread "
230102 "Banana Apple"
230102 "Chicken"
121212 "Chicken Bread"
301450 "Grapes Lettuce"
... ...
Now, i wish to know what product each person purchases, using a dummy variable for each item:
Client_ID Apple Banana Bread Chicken Grapes Lettuce Orange_Juice
121212 0 1 1 1 0 1 1
230102 1 1 0 1 0 0 0
301450 0 0 0 0 1 1 0
... ... ... ... ... ... ... ...
I asked a similar question some weeks ago, but i didn’t have several items in the same row, as is the case here. So i’m really lost. I tried to separate the items in multiple columns, but that was not ideal, since each purchase can have a different number of items (up to dozens as far as i know).
Any ideas on how to proceed? Thanks in advance!
Here is a flexible solution using PROC FREQ and PROC TRANSPOSE. The SPARSE option gets you your zeros. I assume you only want 1 or 0, hence the NODUPKEY sort; remove NODUPKEY (or remove the sort entirely) if you do want 2 for BREAD for the first ID.
First create a vertical dataset with one record per ID/Product (splitting Purchase into Products); then PROC FREQ that dataset so you have a dataset with 1/0 for each client/product combination; then transpose that using product as ID and count as VAR.
If you have any products that you want to guarantee show up as zero even if nobody has them, you should add a row to the initial table (or anything prior to the proc freq) with a dummy client ID and ALL possible products, then after the transpose delete the dummy client ID.