Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8357211
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T10:24:34+00:00 2026-06-09T10:24:34+00:00

I would like to use a supervised machine learning algorithm to predict a binary

  • 0

I would like to use a supervised machine learning algorithm to predict a binary function (true or false) for a set of sentences based on the presence or absence of words in the sentences.

Ideally, I would like to avoid having to hardcode the set of words used to decide on the output so that the algorithm automatically learns which words are (together ?) most likely to trigger specific outputs.

http://shop.oreilly.com/product/9780596529321.do (Programming Collective Intelligence) has a nice section in chapter 4 titled “Learning From Clicks” which describes how to do this by using 1 layer of hiden nodes in a neural network with one new hidden node for each new combination of input words.

Similarly, it is possible to create a feature for each word in the training data set and train pretty much any classic machine learning algorithm using these features. Adding new training data will generate new features which will require me to re-train the algorithm from scratch.

Which brings me to my questions:

  • is it actually a problem if I have to retrain everything from scratch whenever the training data set is extended ?
  • what kind of algorithm would more experience machine learning users recommend to use for this kind of problem ?
  • what criteria should I use in picking an algorithm versus another ? (other than actually trying them all and see which perform better with precision/recall metrics)
  • if you have worked on similar problems, what about extending the features with 2-grams (1 if a specific 2-gram is present, 0 if not) ? 3-grams ?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T10:24:36+00:00Added an answer on June 9, 2026 at 10:24 am

    You could look into the general area of topic modelling if you want to find words which are generally found together.

    The most simple approach would be to use latent semantic analysis ( http://en.wikipedia.org/wiki/Latent_semantic_analysis ), which is just applying SVD to a term document matrix. You’d then need to do some additional post hoc analysis to fit this to your particular outcome.

    A more involved, and much more complex approach would be to use latent dirichlet allocation ( http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation )

    In terms of just adding new features (words) that is fine as long as you are going to retrain. You can also use TF/IDF to give that particular word a value when representing the matrix (Instead of just a 1 or 0).

    I don’t know what programming language you are trying to do this in, but I know there are libraries out there in Java and Pythont hat do all of the above.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to use the logout function from Django but not sure how
I would like use .appendTo() on different elements determined by a function. For example,
I have an array of Integers in Java, I would like use only a
I would like to use R to extract the speaker out of scripts formatted
I would like to use Maven's password encryption such as it uses for nodes
I would like to use D3.js (or maybe Raphaël ) for backend-generated reports using
I would like to use the jenkins script console some more. Where do I
I would like to use a pre-commit hook in riak to validate the json
I would like to use GIO (which is part of GLIB) on Android by
I would like to use the MFMailComposeViewController mailComposeDelegate property with completion block syntax, but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.