I’m currently writing a program in Java to extract metadata from multiple document type.
At the moment I’m trying to extract metadata from .vsd files using Apache Tika.
I previously tried using Apache POI directly, but the fact is it’s very hard to find any documentation on this unusued part of the library, so I decided to go with Tika.
Ok, so here is the code sample I’m crashing on ( crash at line : 7) :
ParseContext context = new ParseContext();
Metadata metadata = new Metadata();
WriteOutContentHandler handler = new WriteOutContentHandler(10 * 1024 * 1024);
try {
FileInputStream fis = new FileInputStream(fileName);
OfficeParser officeParser = new OfficeParser();
officeParser.parse(fis, handler, metadata, context);
String[] metadataNames = metadata.names();
// Display all metadata
for (String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
}
} catch (FileNotFoundException E) {
System.out.println("No such files : " + fileName);
}
And here is the stacktrace :
Exception in thread “main” java.lang.RuntimeException: TODO at
org.apache.poi.hdgf.pointers.PointerFactory.createPointer(PointerFactory.java:45)
at org.apache.poi.hdgf.HDGFDiagram.(HDGFDiagram.java:99) at
org.apache.poi.hdgf.extractor.VisioTextExtractor.(VisioTextExtractor.java:55)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
at VsdFile.displayMetadata(VsdFile.java:43) at
main.main(main.java:26) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
I’m pretty rusty in Java, so I hope my question is not too obvious to answer to.
Thank you.
Regards,
Bdloul
So the problem was a bad vsd file.