I have seen in a few cases that converting a file from Microsoft Word format (either doc or docx) to HTML substantially decreases the file size (in addition to the images folder), as much as halving the size.
Is this always the case? Why does this happen?
.doc files have proprietary code which calculates positioning, font sizes, rendering, among other data. This takes up size in the file and is not transferred over to the HTML code for obvious reasons.
The following can increase the size of your .doc file and make it suck more than it does by default:
“Fast Saves” being enabled.
Preview Picture
Versions (File | Versions): If “Automatically save version on
close” is turned on.
Revisions (Tools | Track Changes)
Embedded True Type fonts (Tools | Options | Save)
Embedded graphics
Embedded objects: These are even worse than ordinary graphics saved with
the document. If you see an { EMBED } code, the graphic is an OLE object.
Unless you need to be able to edit the object in place, unlink it using
Ctrl+Shift+F9
File format/compression – .RTF vs .DOC, etc.
Document corruption: See
http://www.mvps.org/word/FAQs/AppErrors/CorruptDoc.htm.