I have a dataframe with duplicate column names in R, when I select specific columns from this dataframe using subset it renames the duplicates making them distinct. When I am creating a dataframe using the function data.frame() I can stop this happening by using the argument check.names = FALSE, is there a way I can also do this using subset (or any other way which selects names columns).
For example say I have the dataframe
data <- data.frame('sample' = 50, 'x_mean' = 1.5, 'Lower CI' = 1.0, 'Upper CI' = 2.0, 'sample' = 50, 'y_mean' = 0.6, 'Lower CI' = 0.3, 'Upper CI' = 0.9, check.names = FALSE)
selectVec <- c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)
Using the code
subset(data, select = selectVec)
renames the duplicate confidence intervals ‘Lower CI.1’ and ‘Upper CI.1’, whereas I want to keep these as ‘Lower CI’ and ‘Upper CI’. Does anyone know a way of doing this?
Thanks in advance.
It looks like you will get the same behavior with
[. The only way I can think of is to reassign the names afterwards:However, be aware that having duplicated column names is a very unnatural, complicated (obviously) and risky format for keeping your data. I would try to understand why the file or data.frame had duplicated columns in the first place and fix it there.