I have a datafile:
https://dl.dropbox.com/u/22681355/example.csv
Read file:
example<-read.csv("example.csv")
example<-example[,-1]
example[,1] contains a list of numbers increasing in numerical order.
example[,2] contains another set of numbers
First I would like to identify the numbers in example[,2] that are no listed in example[,1]
diff<-setdiff(example[,2],example[,1])
Now that I know these values I would like to insert them into example[,1] leaving existing values in example[,1] and example[,2] intact.
A short example would be:
Example[,1] Example[,2]
1 1000
1 50
1 3
1 90
1 25
3 4
5 2
5 7
etc etc
After I run setdiff() I get the numbers not in the first column but in the second.
Now I would like to place them in example[,1] to produce the following output:
Example[,1] Example[,2]
1 1000
1 50
1 3
1 90
1 25
2 NA
3 4
4 NA
5 2
5 7
etc etc
So basically placing them in numerical order but leaving everything else intact.
Part 1 excellently solved by Joris Meys!
I have two further questions:
/////////////////////////////////////////////
////////////////////////////////////////////
1:
Can the same be done if there is an additional third column but I don’t want to do anything with it?
e.g.:
ORIGINAl
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
3 4 15
5 2 20
5 7 9
etc etc
Desired OUTPUT:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 NA NA
3 4 15
4 NA NA
5 2 20
5 7 19
etc etc
2:
Instead of adding NA in example[,2] to cases where example[,1] doesnt have the value from example[,2] for example example[,1] doesn’t have number ’30’ then I would like to search for whether example[,2] has number’30’and see what value example[,1] has in that row then add it to example[,2] instead of the NA’s.
for example:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 NA NA
3 4 15
4 NA NA
5 2 20
5 7 19
etc etc
Instead of NA’s have:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 5 20
3 4 15
4 3 15
5 2 20
5 7 19
etc etc
The following approch also works if your matrix has more than two columns. It’s an extension of Joris Meys’ solution.
The result:
Once you have created this matrix, it’s easy to generate a new one without NAs:
The result: