The first three columns exist. I am trying to create a formula for the fourth (HH_ANALYSIS_FLAG).
ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_ANALYSIS_FLAG
1001 1 1 0
1002 2 0 0
1003 3 1 0
1004 3 0 0
1005 3 0 0
1006 2 0 0
1007 4 0 0
1008 1 1 0
I have 50,000 accounts. They are flagged as being under analysis with the ACCOUNT_ANALYSIS_FLAG column (0,1). All accounts belong to a household. Multiple accounts can belong to the same household. I need the HH_ANALYSIS_FLAG column to evaluate to true or false (0,1) if any account in the same household is under analysis. So with the above data and a working formula, my spreadsheet would look like so:
ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_ANALYSIS_FLAG
1001 1 1 1
1002 2 0 0
1003 3 1 1
1004 3 0 1
1005 3 0 1
1006 2 0 0
1007 4 0 0
1008 1 1 1
The following formula should do the trick. In fact, it will give you the total number of accounts being analysed per household.
For each row this takes selects the set of rows that share the value in the ACC_NUM column (based on the row conaining the formula) and sums together the values in the corresponding ACC_ANALYSIS_FLAG columns. This gives you the total number of accounts under analysis for the given household. Compare the result to 0 if you only need to use it as a boolean value.
EDIT:
Apparently the performance of this isn’t up to snuff. However, assuming the the household numbers are all colocated, it should be possible to speed things up significantly by changin to something like the following.
This assumes that there are at most 4 accounts per household, and thus limits the range of the SUMIF to the current cell +/- 3 rows.
To avoid referencing invalid cells you’ll the first and last rows have to be treated as special cases. If you need to generate a single forumala for all of these cells I think it should be possible using the
OFFSETin combination withMAX,MINandROWto generate the appropriate ranges with just a little arithmatic.