Anyone Know of any C# alternative to TiKa able to extract text from HTML,PDF,

Question

0

Editorial Team

Asked: May 16, 20262026-05-16T21:56:12+00:00 2026-05-16T21:56:12+00:00

Anyone Know of any C# alternative to TiKa able to extract text from HTML,PDF,

0

Anyone Know of any C# alternative to TiKa able to extract text from HTML,PDF, etc..?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T21:56:13+00:00

I’ve got a similar need… I’ve got a .Net project where I need to pull text out of various files (.XLS, .DOC, .PDF, etc), for indexing with Lucene.Net

This blog post seems to be exactly what I’m after: A .Net wrapper around the .jar file!

I’m implementing it now, but if it doesn’t work then I’ll update my answer here…

Edit: Ok, it’s up, running, and working well (if a little slowly). There’s some pretty nasty dependency wrangling with the IKVM bits, but it’s the best alternative that I’ve found.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Anyone Know of any C# alternative to TiKa able to extract text from HTML,PDF,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply