I have a simple problem, honestly I have tried to find an answer. I really did.
I have a bunch of .csv files that have been imported into R data frames
I would like to take a specific column (with a common name) from each data frame, merge it into a single data frame with the name of the data frame as the column name, and produce a boxplot using each column.
The columns are not of the same length and frequently contain NA’s.
Example: Data frames (in which the first row is the header)
Data frame name Tom:
col1 col2 col3 col4
name1 33 44 55
name2 33 NA 55
name3 33 34 55
name4 33 24 55
Data frame name Bob:
col1 col2 col3 col4
name5 33 74 55
name6 33 NA 55
name7 33 32 55
Data frame name Stu:
col1 col2 col3 col4
name8 33 44 55
name9 33 11 55
name10 33 34 55
name11 33 24 55
name12 33 32 55
name13 33 24 5
name14 33 34 55
name15 33 24 5
Desired result
Tom Bob Stu
44 74 44
NA NA 11
34 32 34
24 24
32
24
34
24
So, taking “col3” (the column name is shared) from each data frame, and produce a new data frame of only the col3 data, each column to be named as the name of the data frame it came from…followed by producing a side-by-side boxplot of Tom, Bob and Stu (but I can probably work that out). It’s OK to have NA’s in the empty spaces in the desired result above.
Here is a basic approach where I create a new combined data frame using
rbind, after adding an identifier column to each of your 3 data frames. Note that you can also create a boxplot without first creating a single data frame.