I have 1 data.frame named A, there are 5000 columns in it. How can I find columns in this data.frame that are equal to each other.
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
As @John mentioned, there are problems with using
duplicated. I would add that transposing the data.frame forces all the data into a same data type before it is even compared withduplicated. On an example, here is a data.frame:Note that column
cis very similar to columnsb,e, andf, but not identical because of the different types (character versus numeric). The solution suggested by @Jubbles would disregard these differences.Instead, it seems more appropriate to use the
identicalfunction on the columns of your data.frame. You can compare columns two-by-two usingouter:From here, you can use clustering to identify groups of identical columns (there may be better ways so if you know one, feel free to comment or even edit my answer.)
Edit 1: to disregard differences in floating point values, one can use
Edit 2: a more efficient method than clustering for grouping the names of identical columns is