Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7972451
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T07:52:20+00:00 2026-06-04T07:52:20+00:00

For a web application I’m building I need to analyze a website, retrieve and

  • 0

For a web application I’m building I need to analyze a website, retrieve and rank it’s most important keywords and display those.

Getting all words, their density and displaying those is relatively simple, but this gives very skewed results (e.g. stopwords ranking very high).

Basically, my question is: How can I create a keyword analysis tool in PHP which results in a list correctly ordered by word importance?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T07:52:21+00:00Added an answer on June 4, 2026 at 7:52 am

    Recently, I’ve been working on this myself, and I’ll try to explain what I did as best as possible.

    Steps

    1. Filter text
    2. Split into words
    3. Remove 2 character words and stopwords
    4. Determine word frequency + density
    5. Determine word prominence
    6. Determine word containers
      1. Title
      2. Meta description
      3. URL
      4. Headings
      5. Meta keywords
    7. Calculate keyword value

    1. Filter text

    The first thing you need to do is filter make sure the encoding is correct, so convert is to UTF-8:

    iconv ($encoding, "utf-8", $file); // where $encoding is the current encoding
    

    After that, you need to strip all html tags, punctuation, symbols and numbers.
    Look for functions on how to do this on Google!

    2. Split into words

    $words = mb_split( ' +', $text );
    

    3. Remove 2 character words and stopwords

    Any word consisting of either 1 or 2 characters won’t be of any significance, so we remove all of them.

    To remove stopwords, we first need to detect the language.
    There are a couple of ways we can do this:
    – Checking the Content-Language HTTP header
    – Checking lang=”” or xml:lang=”” attribute
    – Checking the Language and Content-Language metadata tags
    If none of those are set, you can use an external API like the AlchemyAPI.

    You will need a list of stopwords per language, which can be easily found on the web.
    I’ve been using this one: http://www.ranks.nl/resources/stopwords.html

    4. Determine word frequency + density

    To count the number of occurrences per word, use this:

    $uniqueWords = array_unique ($keywords); // $keywords is the $words array after being filtered as mentioned in step 3
    $uniqueWordCounts = array_count_values ( $words );
    

    Now loop through the $uniqueWords array and calculate the density of each word like this:

    $density = $frequency / count ($words) * 100;
    

    5. Determine word prominence

    The word prominence is defined by the position of the words within the text.
    For example, the second word in the first sentence is probably more important than the 6th word in the 83th sentence.

    To calculate it, add this code within the same loop from the previous step:’

    $keys = array_keys ($words, $word); // $word is the word we're currently at in the loop
    $positionSum = array_sum ($keys) + count ($keys);
    $prominence = (count ($words) - (($positionSum - 1) / count ($keys))) * (100 /   count ($words));
    

    6. Determine word containers

    A very important part is to determine where a word resides – in the title, description and more.

    First, you need to grab the title, all metadata tags and all headings using something like DOMDocument or PHPQuery (dont try to use regex!)
    Then you need to check, within the same loop, whether these contain the words.

    7. Calculate keyword value

    The last step is to calculate a keywords value.
    To do this, you need to weigh each factor – density, prominence and containers.
    For example:

    $value = (double) ((1 + $density) * ($prominence / 10)) * (1 + (0.5 * count ($containers)));
    

    This calculation is far from perfect, but it should give you decent results.

    Conclusion

    I haven’t mentioned every single detail of what I used in my tool, but I hope it offers a good view into keyword analysis.

    N.B. Yes, this was inspired by the today’s blogpost about answering your own questions!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

My web application is receiving increased attention and I need to provide additional security
My web application uses UserId almost throughout the entire application.. what is the most
My web application generates reports and I need now to add the print preview
In web application [asp.net 2008], i am not getting Mobile template option, is there
The web application I'm working on serves up images with URLs like /image?name=a.gif. Pages
my web application supposed to send email reminders in some point it will send
My web application operates only over SSL and sets a time limited cookie for
My web application stores URL segments in a database. These URL segments are based
My web application consists of library type system where books have due dates. I
My web application returns a file from the filesystem. These files are dynamic, so

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.