I have the following code (based on the samples here), but it is not working:
[...]
def my_analyzer(s):
return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)
ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]
The following error is given when calling fit_transform:
AttributeError: 'function' object has no attribute 'analyze'
According to the documentation, CountVectorizer should be created like this: vectorizer = CountVectorizer(tokenizer=my_tokenizer). However, if I do that, I get the following error: "got an unexpected keyword argument 'tokenizer'".
My actual scikit-learn version is 0.10.
You’re looking at the documentation for 0.11 (to be released soon), where the vectorizer has been overhauled. Check the documentation for 0.10, where there is no
tokenizerargument and theanalyzershould be an object implementing ananalyzemethod:http://scikit-learn.org/dev is the documentation for the upcoming release (which may change at any time), while http://scikit-learn/stable has the documentation for the current stable version.