Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7021949
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T23:31:00+00:00 2026-05-27T23:31:00+00:00

My code just creates inline-diff (on a per-word basis) of a string using HTML

  • 0

My code just creates inline-diff (on a per-word basis) of a string using HTML tags, so CSS can hide/show that which was removed / added.

In my tests, I use () for additions and {} for removals.

Here is my text:

Inputs:

"e&nbsp;<b><u>Zerg</u></b>&nbsp;a"
"e Zerg a"

Output:

"e(?)(\240){&nbsp;<b>}{<u>}Zerg(?)(\240){</u>}{</b>}{&nbsp;}a"

Now, I don’t do anything with changing the encoding at all, so… I’m really confused as to how a question mark and \240 got in there. o.o

What kind of encoding is this?

I’m using Ruby 1.8.7.


I found the source of the problem. It happens when I convert the new string to an array for Diff::LCS to use:

The code for that:

  def self.convert_html_string_to_html_array(str)
=begin
  Things like &nbsp (and other char codes), and tags need to be considered the same element
  also handles the decision to diff per char or per word

  also need to take into consideration JavaScript and CSS that might be in the middle of a selection
=end
    result = Array.new
    compare_words = str.has_at_least_one_word?
    i = 0
    while i < str.length do
      cur_char = str[i..i]
      case cur_char
      when "&"
        # For this we have two situations, a stray char code, and a char code preceeding a tag
        next_index = str.index(";", i)
        case str[next_index + 1..next_index + 1] # Check to see if tag
        when "<"
          next_index = str.index(">", i)
        end
        result << str[i..next_index]
        i = next_index
      when "<"
        next_index = str.index(">", i)
        result << str[i..next_index]
        i = next_index
      when " "
        result << cur_char
      else
        if compare_words
          # In here we need to check the above rules again, cause tags can be touching regular text
          next_index = i + 1
          next_index = str.index(" ", next_index)
          next_index = str.length if next_index.nil?
          next_index -= 1

          if i < str.length and str[i..next_index].include?("<") # Beginning of a tag
            next_index = str.index(">", i)
          end

          result << str[i..next_index] # We don't want to include the space
          i = next_index
        else
          result << cur_char
        end
      end
      i += 1
    end

    return result # Removes the trailing empty string
  end

To clarify, this:

'e Zerg a'

gets turned into this:

[
    [0] "e",
    [1] "\302",
    [2] "\240",
    [3] "Z",
    [4] "e",
    [5] "r",
    [6] "g",
    [7] "\302",
    [8] "\240",
    [9] "a"
]
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T23:31:00+00:00Added an answer on May 27, 2026 at 11:31 pm

    Update to 1.9.2 or above (I recommend using RVM). 1.8.7 has some weird stuff going on with strings…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written code that automatically creates CSS sprites based on the IMG tags
I've just created a little app that programmatically compiles code using the C# Compiler,
My code just scrapes a web page, then converts it to Unicode. html =
I have made a new windows service which works fine using barebone code (just
Help, please. The code - just a styled pre and a styled div (using
See code just bellow Our generic interface public interface Repository<INSTANCE_CLASS, INSTANCE_ID_CLASS> { void add(INSTANCE_CLASS
I'm writing some code (just for fun so far) in Python that will store
I'm someone who writes code just for fun and haven't really delved into it
I want to define some member variable and some code just in Debug Mode,
I'm trying to run a 3d array but the code just crashes in windows

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.