I have a list called “training_data”. The “training_data” list contains data read from several files using the following function.
training_data <- lapply(files, read.table, header=TRUE, sep=",")
I can access the first field of any dataset using the following command:
training_data[[1]][1] # The first field contains the class "pos OR neg"
I have to use these datasets (contained within training_data) for binary classification using Support Vector Machines (e1071). But the problem is that certain data sets contains only data for one class i.e either all pos or all neg, which is not acceptable for svm function and I want to exclude those datasets. I have tried the following code but unable to access the class column.
training_data<-lapply(training_data,
function(data)
{
if(["the class field is always positive"])
### exclude this dataset from training_data
})
Update:
How exactly I can access the first column of data passed to function? And How can I exclude those data sets from training_data which consits of only one class in the class column?
Thanks
This is what the
Filterfunction was made for. Since you didn’t provide replication code, here is a quick example on how to useFilter. Suppose you have a large list of vectors, each 2 elements in length:Now if you want to only retain those vectors in the list where the first element is greater than 0.5, you would do something like this:
Now, if each element of
mylistis adata.frameand the first column in eachdata.frameis the response vector for the model, as appears to be the case with your data, you can use thedata[,1]notation mentioned by Justin to filter out alldata.framesthat have only positive or negative values in the first column: