Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6639205
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T23:31:36+00:00 2026-05-25T23:31:36+00:00

I am using PCA to find out which variables in my dataset are redundand

  • 0

I am using PCA to find out which variables in my dataset are redundand due to being highly correlated with other variables. I am using princomp matlab function on the data previously normalized using zscore:

[coeff, PC, eigenvalues] = princomp(zscore(x))

I know that eigenvalues tell me how much variation of the dataset covers every principal component, and that coeff tells me how much of i-th original variable is in the j-th principal component (where i – rows, j – columns).

So I assumed that to find out which variables out of the original dataset are the most important and which are the least I should multiply the coeff matrix by eigenvalues – coeff values represent how much of every variable each component has and eigenvalues tell how important this component is.
So this is my full code:

[coeff, PC, eigenvalues] = princomp(zscore(x));
e = eigenvalues./sum(eigenvalues);
abs(coeff)/e

But this does not really show anything – I tried it on a following set, where variable 1 is fully correlated with variable 2 (v2 = v1 + 2):

     v1    v2    v3
     1     3     4
     2     4    -1
     4     6     9
     3     5    -2

but the results of my calculations were following:

v1 0.5525
v2 0.5525
v3 0.5264

and this does not really show anything. I would expect the result for variable 2 show that it is far less important than v1 or v3.
Which of my assuptions is wrong?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T23:31:37+00:00Added an answer on May 25, 2026 at 11:31 pm

    EDIT I have completely reworked the answer now that I understand which assumptions were wrong.

    Before explaining what doesn’t work in the OP, let me make sure we’ll have the same terminology. In principal component analysis, the goal is to obtain a coordinate transformation that separates the observations well, and that may make it easy to describe the data , i.e. the different multi-dimensional observations, in a lower-dimensional space. Observations are multidimensional when they’re made up from multiple measurements. If there are fewer linearly independent observations than there are measurements, we expect at least one of the eigenvalues to be zero, because e.g. two linearly independent observation vectors in a 3D space can be described by a 2D plane.

    If we have an array

    x = [    1     3     4
             2     4    -1
             4     6     9
             3     5    -2];
    

    that consists of four observations with three measurements each, princomp(x) will find the lower-dimensional space spanned by the four observations. Since there are two co-dependent measurements, one of the eigenvalues will be near zero, since the space of measurements is only 2D and not 3D, which is probably the result you wanted to find. Indeed, if you inspect the eigenvectors (coeff), you find that the first two components are extremely obviously collinear

    coeff = princomp(x)
    coeff =
          0.10124      0.69982      0.70711
          0.10124      0.69982     -0.70711
           0.9897     -0.14317   1.1102e-16
    

    Since the first two components are, in fact, pointing in opposite directions, the values of the first two components of the transformed observations are, on their own, meaningless: [1 1 25] is equivalent to [1000 1000 25].

    Now, if we want to find out whether any measurements are linearly dependent, and if we really want to use principal components for this, because in real life, measurements my not be perfectly collinear and we are interested in finding good vectors of descriptors for a machine-learning application, it makes a lot more sense to consider the three measurements as “observations”, and run princomp(x'). Since there are thus three “observations” only, but four “measurements”, the fourth eigenvector will be zero. However, since there are two linearly dependent observations, we’re left with only two non-zero eigenvalues:

    eigenvalues =
           24.263
           3.7368
                0
                0
    

    To find out which of the measurements are so highly correlated (not actually necessary if you use the eigenvector-transformed measurements as input for e.g. machine learning), the best way would be to look at the correlation between the measurements:

    corr(x)
      ans =
            1            1      0.35675
            1            1      0.35675
      0.35675      0.35675            1
    

    Unsurprisingly, each measurement is perfectly correlated with itself, and v1 is perfectly correlated with v2.

    EDIT2

    but the eigenvalues tell us which vectors in the new space are most important (cover the most of variation) and also coefficients tell us how much of each variable is in each component. so I assume we can use this data to find out which of the original variables hold the most of variance and thus are most important (and get rid of those that represent small amount)

    This works if your observations show very little variance in one measurement variable (e.g. where x = [1 2 3;1 4 22;1 25 -25;1 11 100];, and thus the first variable contributes nothing to the variance). However, with collinear measurements, both vectors hold equivalent information, and contribute equally to the variance. Thus, the eigenvectors (coefficients) are likely to be similar to one another.


    In order for @agnieszka’s comments to keep making sense, I have left the original points 1-4 of my answer below. Note that #3 was in response to the division of the eigenvectors by the eigenvalues, which to me didn’t make a lot of sense.

    1. the vectors should be in rows, not columns (each vector is an
      observation).
    2. coeff returns the basis vectors of the principal
      components, and its order has little to do with the original input
    3. To see the importance of the principal components, you use eigenvalues/sum(eigenvalues)
    4. If you have two collinear vectors, you can’t say that the first is important and the second isn’t. How do you know that it shouldn’t be the other way around? If you want to test for colinearity, you should check the rank of the array instead, or call unique on normalized (i.e. norm equal to 1) vectors.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to apply PCA on my data using princomp(x) , that has been
I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA
Possible Duplicate: Comparing svd and princomp in R How to perform PCA using 2
I am using princomp in R to perform PCA. My data matrix is huge
I am trying to do PCA analysis using princomp function in R. The following
Using android 2.3.3, I have a background Service which has a socket connection. There's
I need to show 1st 10 eigenfaces using PCA for a image feature vector
I intend to to some principal component analysis and I am using this PCA
Using MVC2 I have an AJAX form which is posting to a bound model.
using grep, vim's grep, or another unix shell command, I'd like to find the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.