Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6366501
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T00:25:45+00:00 2026-05-25T00:25:45+00:00

In working on a TextToCodeRatio function for my SeoTools Excel Plugin , I’d like

  • 0

In working on a TextToCodeRatio function for my SeoTools Excel Plugin, I’d like some input on my approach:

I’m using HtmlAgiltyPack to get all text nodes, discard those that have script and style tags as parent node and perform some additional text manipulation:

    public static int CalculateTextSize(HtmlDocument doc)
    {
        int size = 0;
        foreach (HtmlNode node in 
           doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']"))
        {
            HtmlNode parentNode = node.ParentNode;
            if (parentNode != null)
            {
                if (parentNode.Name.Equals("script",
                       StringComparison.CurrentCultureIgnoreCase)
                    || parentNode.Name.Equals("style",
                       StringComparison.CurrentCultureIgnoreCase))
                {
                    continue;
                }
            }

            string text = node.InnerText.Trim();
            //Just in case agility pack gets it wrong...
            text = StringUtils.StripTags(text);     
            //Replaces "&" => "&" etc.
            text = HttpUtility.HtmlDecode(text);
            //All whitespace is reduced to single space, i.e. 
            //"Foo\r\nBar\t\   Hello" => "Foo Bar Hello"            
            text = StringUtils.NormalizeWhitespace(text);   
            size += text.Trim().Length;
        }

        return size;
    }

What do you think? It’s a quite restrictive approach as for example on
aftonbladet.se my method returns 23722 while the SeoChat tool returns 28671. Am I doing it wrong?

UPDATE: As pointed out by Oskar Kjellin I’m counting chars instead of bytes and SeoChat is counting bytes. What is best, counting chars or bytes? I think that this metric shouldn’t be affected by what Encoding the page is written in.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T00:25:46+00:00Added an answer on May 25, 2026 at 12:25 am

    The reason for the difference is because he is counting bytes and you are counting character.

    I would say that the best is to calculate the bytes as the reason for doing this is to see how many percentage of the loaded page is text. So you have to get the total page size loaded, and use that to calculate. You cannot use character count for that.

    Not sure how the search engines do this, but yours is quite easy to fool. You can just put everything inside a big div of text and use CSS to hide the div. It depends on how thorough you want to be.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Working with MVC 2 ad with the help of some friends I thought all
working on creating an XML file with some data using Python. I am trying
Working on a qt project using msvc2008 compiler. I copied some functions from an
Working with python interactively, it's sometimes necessary to display a result which is some
Working in Eclipse on a Dynamic Web Project (using Tomcat (v5.5) as the app
Working with TCL and I'd like to implement something like the Strategy Pattern .
Working with an Oracle 9i database from an ASP.NET 2.0 (VB) application using OLEDB.
Working with Excel interop, I'm trying be very careful about not implicitly creating any
Working on site development using Drupal. Using View module also. I did defined severar
Working with a legacy codebase in Grails. Under some conditions (we're unclear exactly what)

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.