I am doing some PCA analysis for large spreadsheets, and I’m picking my PCs according to the loadings.
As far as I have read, since the data I have have differnt units, standardization is a must before performing the PCA analysis.
Does the function prcomp() inherently performs standardization?
I was reading the prcomp() help file and saw this under the arguments of prcomp():
scale. a logical value indicating whether the variables should be scaled to have
unit variance before the analysis takes place. The default is FALSE for
consistency with S, but in general scaling is advisable. Alternatively, a
vector of length equal the number of columns of x can be supplied. The
value is passed to scale.
Does “scaling variables to have unit variance” mean standardization?
I am currently using this command:
prcomp(formula = ~., data=file, center = TRUE, scale = TRUE, na.action = na.omit)
is it enough? or shall I do a separate step of standardization?
Thanks,
If you mean by standardization that each column is divided by their standard deviation, and the mean of each column is subtracted, than using
scale = TRUEandcenter = TRUEis what you want.