Possible Duplicate: Read csv with two headers into a data.frame I am new to

Question

0

Asked: June 10, 20262026-06-10T07:20:04+00:00 2026-06-10T07:20:04+00:00

Possible Duplicate: Read csv with two headers into a data.frame I am new to

0

Possible Duplicate:
Read csv with two headers into a data.frame

I am new to R and working to use R to analyse some data. The data happens to be in Excel format and right now I’m struggling to find a way to convert it into a format that is R-friendly.

The issue is that the column headers have merged cells, so in effect the headers have two rows. I’d like to convert it into a normal set of 1-D vectors, add an extra column and a row. Let me explain with an example:

Currently the excel format looks thus:

  |  H  |  J  | 
Y |M |F |M |F | 
== == == == == 
Y1|V1|V2|V3|V4|

H,J are merged column headers and each of them span columns M and F.

The = indicate that the rows above are header rows

Given that H,J both are elements under, say R, I would like to convert this into a columnar format with a normal header and two rows, like this

Y |R |M |F |
== == == ==
Y1|H |V1|V2|
Y1|J |V3|V4|

Does anyone have an idea how to do this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T07:20:05+00:00

First, some assumptions:

The merged headings are on the first line of the CSV
The merged headings start in the second column of the CSV
The variable names in the second line of the CSV repeat (except for the variable in the first column)

Second, your data.

temp = c(",\"H\",,\"J\",", 
         "\"Y\",\"M\",\"F\",\"M\",\"F\"", 
         "\"Y1\",\"V1\",\"V2\",\"V3\",\"V4\"")

Third, a slightly modified version of this answer.

# check.names is set to FALSE to allow variable names to be repeated
ONE = read.csv(textConnection(temp), skip=1, check.names=FALSE,
               stringsAsFactors=FALSE)
GROUPS = read.csv(textConnection(temp), header=FALSE, 
                  nrows=1, stringsAsFactors=FALSE)
GROUPS = GROUPS[!is.na(GROUPS)]

# This can be shortened, but I've written it this way to show how
#   it can be generalized. For instance, if 3 columns were repeated
#   instead of 2, the rep statement could be changed to reflect that
names(ONE)[-1] = paste0(names(ONE)[-1], ".",
                        rep(GROUPS, each=(length(names(ONE)[-1])/2)))

Fourth, the actual reshaping of the data.

TWO = reshape(ONE, direction="long", ids=1, varying=2:ncol(ONE))
# And, here's the output.
TWO
#      Y time  M  F id
# 1.H Y1    H V1 V2  1
# 1.J Y1    J V3 V4  1

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: Read csv with two headers into a data.frame I am new to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply