Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9249687
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T10:15:41+00:00 2026-06-18T10:15:41+00:00

I have been working on a Python coded priority email inbox, with the ultimate

  • 0

I have been working on a Python coded priority email inbox, with the ultimate aim of using a machine learning algorithm to label (or classify) a selection of emails as either important or un-important. I will begin with some background information and then move into my question.

I have so far developed code to extract data from an email and process it to discover the most important ones. This is achieved using the following email features:

  • Senders Address Frequency
  • Thread Activity
  • Date Received (time between replies)
  • Common Words in body/subject

The code I have currently applies a ranking (or weighting) (value 0.1-1) to each email based on its importance and then applies a label of either ‘important’ or ‘un-important’ (In this case this is just 1 or 0). The status of priority is awarded if the rank is >0.5. This data is stored in a CSV file (as below).

     From           Subject       Body        Date          Rank    Priority 
     test@test.com  HelloWorld    Body Words  10/10/2012    0.67    1
     rest@test.com  ByeWorld      Body Words  10/10/2012    0.21    0
     best@test.com  SayWorld      Body Words  10/10/2012    0.91    1
     just@test.com  HeyWorld      Body Words  10/10/2012    0.48    0
     etc        …………………………………………………………………………

I have two sets of email data (One Training, One Testing). The above applies to my training email data. I am now attempting to train a learning algorithm so that I can predict the importance of the testing data.

To do this I have been looking at both SCIKIT and NLTK. However, I am having trouble transferring the information I have learnt in the tutorials and implementing into my project. I have no particular requirements in regards to which learning algorithm is used. Is this as simple as applying the following? And if so how?

   X, y = email.data, email.target

   from sklearn.svm import LinearSVC
   clf = LinearSVC()

   clf = clf.fit(X, y)

   X_new = [Testing Email Data]

   clf.predict(X_new)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T10:15:43+00:00Added an answer on June 18, 2026 at 10:15 am

    The easiest (though probably not the fastest) solution(*) is to use scikit-learn’s DictVectorizer. First, read in each sample with Python’s csv module, and build a dict containing (feature, value) pairs, while keeping the priority separate:

    # UNTESTED CODE, may contain a bug or two; also, you need to decide how to
    # implement split_words
    datareader = csv.reader(csvfile)
    dicts = []
    y = []
    
    for row in datareader:
        y.append(row[-1])
        d = {"From": row[0]}
        for word in split_words(row[1]):
            d["Subject_" + word] = 1
        for word in split_words(row[2]):
            d["Body_" + word] = 1
        # etc.
        dicts.append(d)
    
    # vectorize!
    vectorizer = DictVectorizer()
    X_train = vectorizer.fit_transform(dicts)
    

    You now have a sparse matrix X_train that, together with y, you can feed to a scikit-learn classifier.

    Be aware:

    1. When you want to make predictions on unseen data, you must apply the same procedure and the exact same vectorizer object to it. I.e. you have to build a test_dicts object using the loop above, then do X_test = vectorizer.transform(test_dicts).

    2. I’ve assumed you want to predict the priority directly. Predicting the “rank” instead would be a regression problem, rather than a classification one. Some scikit-learn classifiers have a predict_proba method which will produce the probability that email are important, but you can’t train those on the ranks.

    (*) I am the author of scikit-learn’s DictVectorizer, so this is not unbiased advice. It is from the horse’s mouth, though 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am currently working with Python and have been confused over the fact that
I have been working on Learn Python the Hard Way 2nd Ed and it
I have been working on a shop that is built in Python on the
I just started working on Python and have been trying to run an outside
I have been having some trouble with matplotlib since I started using python. When
I am new to Python and have been working with the turtle module as
I have been working on getting a python CGI script to work all day,
I have been working on Python The Hard Way and am getting the above
I have been working on the problems presented in Python Challenge . One of
I have been playing with Python and geektools and I had the script working

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.