I have ported Apache Tika to Android. I have a basic question. While working on EPubParser, I am able to get the URI of the images in the EPub book. I have got the Text of the EPub book using BodyContentHandler and the image links using LinkContentHandler.
Now my problem is how can I show these images at the same place where they were in the source. Can anyone give me a pointer in this regard?
This is my code…
InputStream myInputFileStream = getResources().openRawResource(R.raw.flashback);
BodyContentHandler bodyHandler = new BodyContentHandler();
LinkContentHandler linkHandler = new LinkContentHandler();
TeeContentHandler handler = new TeeContentHandler(bodyHandler, linkHandler);
EpubParser ePubParser = new EpubParser();
Metadata metadata = new Metadata();
try{
ePubParser.parse(myInputFileStream, handler, metadata, new ParseContext());
}
catch(SAXException e){
}
catch(TikaException e){
}
catch(IOException e){
}
String plainText = bodyHandler.toString();
List<Link> linkLists = linkHandler.getLinks();
Your best bet is probably to change how you’re doing it. Instead of getting the text and the links independently, get them all at the same time. To do this, fetch the text contents from Tika as XHTML rather than text, with something like:
Once you have the XHTML, look through for links and images. When you find those, you’ll know exactly where things go in relation to the surrounding text.