I would like to parse AutoCAD’s MText entity and extract the raw text. I see a pattern in the way the text is formatted. If this has already been solved, then I would not need to reinvent the wheel. I have searched online, but have not found sufficient information.
I am searching for any links or references on this subject.
Edit:
To further clarify, we are using the ODA (Open Design Aliance) libraries to access the DWG files. I am not familiar with this library. Another developer is using the library and extracting information from the files including MText entities. I am then provided with a file containing the MText text, which is what I am looking at. I am looking at the MText formatted text, which I have access to and am working with in C#.
Questions:
- I asked the other developer if the ODA library provided a means to extract the raw text unformatted. His response was that it could, however that it would also result in the entity getting written back to the DWG file. I am interested in the raw text without affecting the original DWG file. Does ODA provide a way of extracting the raw text without altering the file?
- I am interested in any documentation on the formatting rules of MText, so that I can consider writing a parser myself if necessary.
- Is there anything out there to convert MText to RTF? I realize that RTF would not completely satisfy all formatting rules, but this could provide a satisfactory means of displaying the formatted text in a WinForms app. Given RTF I could also obtain the raw text.
This Forum thread includes a VB program to strip the control characters from the MText. The code indicates what should be done to strip each control character, so it should be straightforward to write something similar in C#.
Additionally, the documentation of the format codes is available in the AutoCAD documentation.