i wanted to convert a word document to text. So i used a script.

Question

0

Asked: May 27, 20262026-05-27T19:43:56+00:00 2026-05-27T19:43:56+00:00

i wanted to convert a word document to text. So i used a script.

0

i wanted to convert a word document to text. So i used a script.

import win32com.client 

app = win32com.client.Dispatch('Word.Application') 
doc = app.Documents.Open(r'C:\Users\SBYSMR10\Desktop\New folder (2)\GENERAL DATA.doc') 
content=doc.Content.Text
app.Quit()
print content

i have the folllowing result:

enter image description here

Now i want to convert this text into a list which contains all its items. I used

content = " ".join(content.replace(u"\xa0", " ").strip().split())

EDIT

When i do that, i get :

enter image description here

Its not a list. What is the problem? What is that big dot character?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T19:43:57+00:00

Word documents aren’t text, they are documents: They have control information (like formatting) and text. If you ignore the control information, the text is pretty useless.

So you have to dig into the details how to navigate the control structure of the document to find the texts that you’re interested in and then get the text content of that structures.

Note: You’ll find that Word is very complex. If you can, consider these two approaches as well:

Save the Word document as HTML from within Word. It’ll lose some formatting but lists will stay intact. HTML is much more simple to parse and understand than Word.
save the document as OOXML (exists at least since Office 10, the extension is .docx). This is a ZIP archive with XML documents inside. The XML is again easier to parse/understand than the full Word document but harder than the HTML version.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i wanted to convert a word document to text. So i used a script.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply