I have a data file with the format from above.
I loaded it into R, and tried to plot a histogram with the values from the dist column and I have got the error “x must be numeric”.Therefore I tried to change the format.
> head(data)
V1 V2
1 type gene_dist
2 A 64667
3 A 76486
4 A 97416
5 A 30876
6 A 88018
> summary(data)
V1 V2
A : 67 100 : 1
B :122 100906 : 1
type: 1 102349 : 1
1033 : 1
10544 : 1
10745 : 1
(Other):184
I tried to set the format for the column using sapply but the values are changed:
> data[,2]<-sapply(data[,2],as.numeric)
> head(data)
V1 V2
1 type 190
2 A 146
3 A 166
4 A 189
summary(data)
V1 V2
A : 67 Min. : 1.00
B :122 1st Qu.: 48.25
type: 1 Median : 95.50
Mean : 95.50
3rd Qu.:142.75
Max. :190.00
Does anyone know why is this happening?
It looks like your second column is a factor. You need to use
as.characterbeforeas.numeric. This is because factors are stored internally as integers with a table to give the factor level labels. Just usingas.numericwill only give the internal integer codes. There is no need to usesapplysince these functions are vectorized.It is likely that the column is a factor because there are some non-numeric characters in some of the entries. Any such entries will be converted to
NAwith the appropriate warning, but you may want to investigate this in your raw data.As a side note,
datais a poor (though not invalid) choice for a variable name since there is a base function of the same name.