I have a table with about 1000 records and 2000 columns. What I want

Question

0

Asked: May 15, 20262026-05-15T03:11:19+00:00 2026-05-15T03:11:19+00:00

I have a table with about 1000 records and 2000 columns. What I want

0

I have a table with about 1000 records and 2000 columns. What I want to do is categorize each row such that all records with equal column values for all columns except ‘ID’ are given a category ID. My final answer would look like:

ID A  B  C ..... Category ID
1  1  0  3           1
2  2  1  3           2 
3  1  0  3           1
4  2  1  3           2
5  4  5  6           3
6  4  5  6           3

where all columns (besides ID) are equal for IDs 1,3 so they get the same category ID and so on.

I guess my thought was to just write a SQL query that does a group by on every single column besides ‘ID’ and assign a number to each group and then join back to my original table. My current input is a text file, and I have SAS, MS Access, and Excel to work with. (I could use proc sql from within SAS).

Before I go this route and construct the whole query, I was just wondering if there was a better way to do this? It will take some work just to write the query, and I’m not even sure if it is practical to join on 2000 columns (never tried), so I thought I’d ask for ideas before I got too far down the wrong path.

EDIT: I just realized my title doesn’t really make sense. What I was originally thinking was “Is there a way I can group by and categorize at the same time without actually consolidating into groups?”

EDIT2: After importing the table into Excel, I was easily able to determine that only about 200 of the 2000 columns actually varied, so problem with too many columns went away. To categorize, I only imported the columns that varied, and I did something like the following:

proc sql;
   create table categories as 
   select distinct *
   from inputTable;
quit;

data categories;
   set categories;
   categoryID = _N_;
run;

proc sql;
  create table tableCategorized as
  select a.ID, b.CategoryID
  from inputTable as a, categories as b
  where 
     (
     a.A=b.A and
     a.B=b.B and
     a.C=b.C and
     ...
     a.XYZ=b.XYZ);
  ;
quit;

It was a pain to generate all the “=” comparisons, but I just did it using string manipulation techniques in Excel, so it wasn’t too bad at all. Thanks for all of the suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T03:11:19+00:00

Well, I can think of an easy way, but I don’t know whether you are going to hit SAS’s memory/performance limits… I have never tried proc sort with 2000 variables, but maybe someone else has and can comment.

proc sort data= mydata;
    by A B C D /* etc.... */ myLastColumn;
run;

data mydata;
    set mydata;
    by A B C D /* etc....*/ myLastColumn;
    retain categoryID 0;
    if first.myLastColumn then categoryID +1;
run;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a table with about 1000 records and 2000 columns. What I want

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply