Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 825029
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T03:11:19+00:00 2026-05-15T03:11:19+00:00

I have a table with about 1000 records and 2000 columns. What I want

  • 0

I have a table with about 1000 records and 2000 columns. What I want to do is categorize each row such that all records with equal column values for all columns except ‘ID’ are given a category ID. My final answer would look like:

ID A  B  C ..... Category ID
1  1  0  3           1
2  2  1  3           2 
3  1  0  3           1
4  2  1  3           2
5  4  5  6           3
6  4  5  6           3

where all columns (besides ID) are equal for IDs 1,3 so they get the same category ID and so on.

I guess my thought was to just write a SQL query that does a group by on every single column besides ‘ID’ and assign a number to each group and then join back to my original table. My current input is a text file, and I have SAS, MS Access, and Excel to work with. (I could use proc sql from within SAS).

Before I go this route and construct the whole query, I was just wondering if there was a better way to do this? It will take some work just to write the query, and I’m not even sure if it is practical to join on 2000 columns (never tried), so I thought I’d ask for ideas before I got too far down the wrong path.

EDIT: I just realized my title doesn’t really make sense. What I was originally thinking was “Is there a way I can group by and categorize at the same time without actually consolidating into groups?”

EDIT2: After importing the table into Excel, I was easily able to determine that only about 200 of the 2000 columns actually varied, so problem with too many columns went away. To categorize, I only imported the columns that varied, and I did something like the following:

proc sql;
   create table categories as 
   select distinct *
   from inputTable;
quit;

data categories;
   set categories;
   categoryID = _N_;
run;

proc sql;
  create table tableCategorized as
  select a.ID, b.CategoryID
  from inputTable as a, categories as b
  where 
     (
     a.A=b.A and
     a.B=b.B and
     a.C=b.C and
     ...
     a.XYZ=b.XYZ);
  ;
quit;

It was a pain to generate all the “=” comparisons, but I just did it using string manipulation techniques in Excel, so it wasn’t too bad at all. Thanks for all of the suggestions.

  • 1 1 Answer
  • 3 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T03:11:19+00:00Added an answer on May 15, 2026 at 3:11 am

    Well, I can think of an easy way, but I don’t know whether you are going to hit SAS’s memory/performance limits… I have never tried proc sort with 2000 variables, but maybe someone else has and can comment.

    proc sort data= mydata;
        by A B C D /* etc.... */ myLastColumn;
    run;
    
    data mydata;
        set mydata;
        by A B C D /* etc....*/ myLastColumn;
        retain categoryID 0;
        if first.myLastColumn then categoryID +1;
    run;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

i have a table with about 200,000 records. i want to add a field
I have a table with about 100k records and I want to delete some
Hope all is well. I have a table with about 491,000 unique records in
I have a table (parts) that has 2 columns in it, 1000 or so
I have a Concept table of about 1000 rows that I would like to
I have a MySQL table about 1000 million records. It is very slow when
I have a Fusion Table of about 1000 entries, and I show them on
Okay, here's the situation: We have a table of about 50 columns (created by
I have a MySQL table with about 5,000,000 rows that are being constantly updated
I have a table, Sheet1$ that contains 616 records. I have another table, Rates$

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.