Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8434349
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T06:37:55+00:00 2026-06-10T06:37:55+00:00

I was following a tutorial which was available at Part 1 & Part 2

  • 0

I was following a tutorial which was available at Part 1 & Part 2. Unfortunately the author didn’t have the time for the final section which involved using cosine similarity to actually find the distance between two documents. I followed the examples in the article with the help of the following link from stackoverflow, included is the code mentioned in the above link (just so as to make life easier)

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from nltk.corpus import stopwords
import numpy as np
import numpy.linalg as LA

train_set = ["The sky is blue.", "The sun is bright."]  # Documents
test_set = ["The sun in the sky is bright."]  # Query
stopWords = stopwords.words('english')

vectorizer = CountVectorizer(stop_words = stopWords)
#print vectorizer
transformer = TfidfTransformer()
#print transformer

trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
testVectorizerArray = vectorizer.transform(test_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
print 'Transform Vectorizer to test set', testVectorizerArray

transformer.fit(trainVectorizerArray)
print
print transformer.transform(trainVectorizerArray).toarray()

transformer.fit(testVectorizerArray)
print 
tfidf = transformer.transform(testVectorizerArray)
print tfidf.todense()

as a result of the above code I have the following matrix

Fit Vectorizer to train set [[1 0 1 0]
 [0 1 0 1]]
Transform Vectorizer to test set [[0 1 1 1]]

[[ 0.70710678  0.          0.70710678  0.        ]
 [ 0.          0.70710678  0.          0.70710678]]

[[ 0.          0.57735027  0.57735027  0.57735027]]

I am not sure how to use this output in order to calculate cosine similarity, I know how to implement cosine similarity with respect to two vectors of similar length but here I am not sure how to identify the two vectors.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T06:37:57+00:00Added an answer on June 10, 2026 at 6:37 am

    WIth the Help of @excray’s comment, I manage to figure it out the answer, What we need to do is actually write a simple for loop to iterate over the two arrays that represent the train data and test data.

    First implement a simple lambda function to hold formula for the cosine calculation:

    cosine_function = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)
    

    And then just write a simple for loop to iterate over the to vector, logic is for every “For each vector in trainVectorizerArray, you have to find the cosine similarity with the vector in testVectorizerArray.”

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import TfidfTransformer
    from nltk.corpus import stopwords
    import numpy as np
    import numpy.linalg as LA
    
    train_set = ["The sky is blue.", "The sun is bright."] #Documents
    test_set = ["The sun in the sky is bright."] #Query
    stopWords = stopwords.words('english')
    
    vectorizer = CountVectorizer(stop_words = stopWords)
    #print vectorizer
    transformer = TfidfTransformer()
    #print transformer
    
    trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
    testVectorizerArray = vectorizer.transform(test_set).toarray()
    print 'Fit Vectorizer to train set', trainVectorizerArray
    print 'Transform Vectorizer to test set', testVectorizerArray
    cx = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)
    
    for vector in trainVectorizerArray:
        print vector
        for testV in testVectorizerArray:
            print testV
            cosine = cx(vector, testV)
            print cosine
    
    transformer.fit(trainVectorizerArray)
    print
    print transformer.transform(trainVectorizerArray).toarray()
    
    transformer.fit(testVectorizerArray)
    print 
    tfidf = transformer.transform(testVectorizerArray)
    print tfidf.todense()
    

    Here is the output:

    Fit Vectorizer to train set [[1 0 1 0]
     [0 1 0 1]]
    Transform Vectorizer to test set [[0 1 1 1]]
    [1 0 1 0]
    [0 1 1 1]
    0.408
    [0 1 0 1]
    [0 1 1 1]
    0.816
    
    [[ 0.70710678  0.          0.70710678  0.        ]
     [ 0.          0.70710678  0.          0.70710678]]
    
    [[ 0.          0.57735027  0.57735027  0.57735027]]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In answering another question, someone showed me the following tutorial, in which the author
I have created a listview using following tutorial link http://www.ezzylearning.com/tutorial.aspx?tid=1763429 Outcome of this is,
In a blog application (which I have mostly built following a tutorial), I would
I have created iphone bluetooth chat application using following tutorial:http://www.devx.com/wireless/Article/43502/1954. It works well.But when
I'm following the Kevin Skoglund tutorial Ruby on Rails 3 Essential Training, which was
I've been following the awesome Yahoo! Hadoop tutorial , which worked great for getting
I am following the following tutorial ( http://www.highoncoding.com/Articles/642_Creating_a_Stock_Widget_in_ASP_NET_MVC_Application.aspx ) on using ajax to render
I'm using the following tutorial http://developer.android.com/resources/tutorials/views/hello-mapview.html in order to create a map view and
I am following a tutorial here . which shows a basic example on how
I am getting started with Symfony 2 & I am following this tutorial. Now,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.