HYPERLINK “target”label
How can i extract hyperlinks from a HWPF document? I can get paragraphs from the doc file and extract the correct styling if necessary, i.e. bold, italic etc. But how would i identify and extract hyperlinks from a paragraph?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The .doc format doesn’t store hyperlinks in the simplest of ways, as you’ve noticed…
A Hyperlink will be a single CharacterRun, with special markers on it. Once you have detected it, just split up the text based on the quotes.
There’s a good example of doing this in Apache Tika, look at the handleSpecialCharacterRuns method of WordExtractor to see it done.