Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3433084
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T07:31:49+00:00 2026-05-18T07:31:49+00:00

Whats the best method to use the words itself as the features in any

  • 0

Whats the best method to use the words itself as the features in any machine learning algorithm ?

The problem I have to extract word related feature from a particular paragraph. Should I use the index in the dictionary as the numerical feature ? If so, how will I normalize these ?

In general, How are words itself used as features in NLP ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T07:31:50+00:00Added an answer on May 18, 2026 at 7:31 am

    There are several conventional techniques by which words are mapped to features (columns in a 2D data matrix in which the rows are the individual data vectors) for input to machine learning models.classification:

    • a Boolean field which encodes the presence or absence of that word in a given document;

    • a frequency histogram of a
      predetermined set of words, often the X most commonly occurring words from among all documents comprising the training data (more about this one in the
      last paragraph of this Answer);

    • the juxtaposition of two or more
      words (e.g., ‘alternative’ and
      ‘lifestyle’ in consecutive order have
      a meaning not related either
      component word); this juxtaposition can either be captured in the data model itself, eg, a boolean feature that represents the presence or absence of two particular words directly adjacent to one another in a document, or this relationship can be exploited in the ML technique, as a naive Bayesian classifier would do in this instanceemphasized text;

    • words as raw data to extract latent features, eg, LSA or Latent Semantic Analysis (also sometimes called LSI for Latent Semantic Indexing). LSA is a matrix decomposition-based technique which derives latent variables from the text not apparent from the words of the text itself.

    A common reference data set in machine learning is comprised of frequencies of 50 or so of the most common words, aka “stop words” (e.g., a, an, of, and, the, there, if) for published works of Shakespeare, London, Austen, and Milton. A basic multi-layer perceptron with a single hidden layer can separate this data set with 100% accuracy. This data set and variations on it are widely available in ML Data Repositories and academic papers presenting classification results are likewise common.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What is the best method of doing the following: I have Page A with
I'm learning to use the mkreversegeocoder classes and have got it working using the
What are the best current libraries/methods to authenticate users without the use of a
I'm wondering what the best way is to use the XDocument.load and save methods
What's the best method for looping through a table, grabbing all the data in
What's the best method to draw a filled (!) semi circle in a UIView?
I'm trying to work out what the best method to passing a large number
If I've got a block of text, in English, what's the best method of
What is the best method of using native menus in C#? Edit: I want
What is the best method [pattern] to close a newly opened Windows form in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.