Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9165475
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T14:55:09+00:00 2026-06-17T14:55:09+00:00

I have a script, VBS or Ruby, that saves a Word document as ‘Filtered

  • 0

I have a script, VBS or Ruby, that saves a Word document as ‘Filtered HTML’, but the encoding parameter is ignored. The HTML file is always encoded in Windows-1252. I’m using Word 2007 SP3 on Windows 7 SP1.

Ruby Example:

require 'win32ole'
word = WIN32OLE.new('Word.Application')
word.visible = false
word_document = word.documents.open('C:\whatever.doc')
word_document.saveas({'FileName' => 'C:\whatever.html', 'FileFormat' => 10, 'Encoding' => 65001})
word_document.close()
word.quit

VBS Example:

Option Explicit
Dim MyWord
Dim MyDoc
Set MyWord = CreateObject("Word.Application")
MyWord.Visible = False
Set MyDoc = MyWord.Documents.Open("C:\whatever.doc")
MyDoc.SaveAs "C:\whatever2.html", 10, , , , , , , , , , 65001
MyDoc.Close
MyWord.Quit
Set MyDoc = Nothing
Set MyWord = Nothing

Documentation:

Document.SaveAs: http://msdn.microsoft.com/en-us/library/bb221597.aspx

msoEncoding values: http://msdn.microsoft.com/en-us/library/office/aa432511(v=office.12).aspx

Any suggestions, how to make Word save the HTML file in UTF-8?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T14:55:11+00:00Added an answer on June 17, 2026 at 2:55 pm

    My solution was to open the HTML file using the same character set, as Word used to save it.
    I also added a whitelist filter (Sanitize), to clean up the HTML. Further cleaning is done using Nokogiri, which Sanitize also rely on.

    require 'sanitize'
    
    # ... add some code converting a Word file to HTML.
    
    # Post export cleanup.
    html_file = File.open(html_file_name, "r:windows-1252:utf-8")
    html = '<!DOCTYPE html>' + html_file.read()
    html_document = Nokogiri::HTML::Document.parse(html)
    Sanitize.new(Sanitize::Config::RESTRICTED).clean_node!(html_document)
    html_document.at_css('html')['lang'] = 'en-US'
    html_document.at_css('meta[name="Generator"]').remove()
    
    # ... add more cleaning up of Words HTML noise.
    
    sanitized_html = html_document.to_html({encoding: 'utf-8', indent: 0})
    # writing output to (new) file
    sanitized_html_file_name = word_file_name.sub(/(.*)\..*$/, '\1.html')
    File.open(sanitized_html_file_name, 'w:UTF-8') do |f|
        f.write sanitized_html
    end
    

    HTML Sanitizer: https://github.com/rgrove/sanitize/

    HTML parser and modifier: http://nokogiri.org/

    In Word 2010 there is a new method, SaveAs2: http://msdn.microsoft.com/en-us/library/ff836084(v=office.14).aspx

    I haven’t tested SaveAs2, since I don’t have Word 2010.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a VBS script that is to copy a file from an old
I have vbs script and that creates folder, make archive and copy to that
I have the following .VBS script, which works, but it only returns the top
Here I have a small VBS script that helps me append a new line
I have this .vbs script that I am trying to run on windows 7.
I have a script.vbs that can be anywhere on a clients PC and it
I have a very basic VBS script that I plan on using frequently on
I have a script written in VBS that checks every second if the LAN
I have a couple of scheduled tasks that run VBS script. It used to
I have a VBS script which I need to run on a monthly basis

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.