Is it possible to extract text from URLs with Tika? Any links will be

Question

0

Asked: May 25, 20262026-05-25T00:15:35+00:00 2026-05-25T00:15:35+00:00

Is it possible to extract text from URLs with Tika? Any links will be

0

Is it possible to extract text from URLs with Tika? Any links will be appreciated. Or TIKA is usable only for pdf, word and any other media documents?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T00:15:36+00:00

This is from lucid:

InputStream input = new FileInputStream(new File(resourceLocation));
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
PDFParser parser = new PDFParser();
parser.parse(input, textHandler, metadata);
input.close();
out.println("Title: " + metadata.get("title"));
out.println("Author: " + metadata.get("Author"));
out.println("content: " + textHandler.toString());

Instead of creating a PDFParser you can use Tika’s AutoDetectParser to automatically process diff types of files:

Parser parser = new AutoDetectParser();

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Is it possible to extract text from URLs with Tika? Any links will be

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply