Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8969093
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T17:32:41+00:00 2026-06-15T17:32:41+00:00

Possible Duplicate: Text Classification into Categories I am currently working on a solution to

  • 0

Possible Duplicate:
Text Classification into Categories

I am currently working on a solution to get the type of food served in a database with 10k restaurants based on their description. I’m using lists of keywords to decide which kind of food is being served.

I read a little bit about machine learning but I have no practical experience with it at all. Can anyone explain to me if/why it would a be better solution to a simple problem like this? I find accuracy more important than performance!

simplified example:

["China", "Chinese", "Rice", "Noodles", "Soybeans"]
["Belgium", "Belgian", "Fries", "Waffles", "Waterzooi"]

a possible description could be:

“Hong’s Garden Restaurant offers savory, reasonably priced Chinese to our customers. If you find that you have a sudden craving for
rice, noodles or soybeans at 8 o’clock on a Saturday evening, don’t worry! We’re open seven days a week and offer carryout service. You can get fries here as well!”

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T17:32:42+00:00Added an answer on June 15, 2026 at 5:32 pm

    You are indeed describing a classification problem, which can be solved with machine learning.

    In this problem, your features are the words in the description. You should use the Bag Of Words model – which basically says that the words and their number of occurrences for each word is what matters to the classification process.

    To solve your problem, here are the steps you should do:

    1. Create a feature extractor – that given a description of a restaurant, returns the “features” (under the Bag Of Words model explained above) of this restaurant (denoted as example in the literature).
    2. Manually label a set of examples, each will be labeled with the desired class (Chinese, Belgian, Junk food,…)
    3. Feed your labeled examples into a learning algorithm. It will generate a classifier. From personal experience, SVM usually gives the best results, but there are other choices such as Naive Bayes, Neural Networks and Decision Trees (usually C4.5 is used), each has its own advantage.
    4. When a new (unlabeled) example (restaurant) comes – extract the features and feed it to your classifier – it will tell you what it thinks it is (and usually – what is the probability the classifier is correct).

    Evaluation:

    Evaluation of your algorithm can be done with cross-validation, or seperating a test set out of your labeled examples that will be used only for evaluating how accurate the algorithm is.


    Optimizations:

    From personal experience – here are some optimizations I found helpful for the feature extraction:

    1. Stemming and eliminating stop words usually helps a lot.
    2. Using Bi-Grams tends to improve accuracy (though increases the feature space significantly).
    3. Some classifiers are prone to large feature space (SVM not included), there are some ways to overcome it, such as decreasing the dimensionality of your features. PCA is one thing that can help you with it. Genethic Algorithms are also (empirically) pretty good for subset selection.

    Libraries:

    Unfortunately, I am not fluent enough with python, but here are some libraries that might be helpful:

    • Lucene might help you a lot with the text analysis, for example – stemming can be done with EnglishAnalyzer. There is a python version of lucene called PyLucene, which I believe might help you out.
    • Weka is an open source library that implements a lot of useful things for Machine Learning – many classifiers and feature selectors included.
    • Libsvm is a library that implements the SVM algorithm.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: Using SendMessage to enter text into an edit control belonging to another
Possible Duplicate: Glowing Text (HTML CSS) So I'm looking for a 100% css solution
Possible Duplicate: Text box validation not working Right now I have 5 text boxes
Possible Duplicate: Text to speech on iPhone I am working in iPhone application, Using
Possible Duplicate: jquery .text doesn’t render HTML elements into the DOM I have an
Possible Duplicate: Concatenate many rows into a single text string? I have a query
Possible Duplicate: Speech to text API for iphone? I am working on an application
Possible Duplicate: Read a Text File into R I have a custom stopword list
Possible Duplicate: SQL Server Text type vs. varchar data type Using varchar(MAX) vs TEXT
Possible Duplicate: PHP Regex Get Text Between BBCode Tags I have a text like:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.