I am currently working on a project for android using Tesseract OCR. I was

Question

0

Asked: June 14, 20262026-06-14T22:58:32+00:00 2026-06-14T22:58:32+00:00

I am currently working on a project for android using Tesseract OCR. I was

0

I am currently working on a project for android using Tesseract OCR. I was hoping to fine-tune the results given to the user by adding a dictionary. According to tesseract OCR wiki, the best way to go about this would be to

Replace tessdata/eng.user-words with your own word list, in the same
format – UTF8 text, one word per line.

However there is no eng.user-words file in the tessdata folder, I assume that if I just make a text file with my dictionary in it, it will never be used…

Has anybody had a similar experience and knows what to do?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T22:58:33+00:00

If you’re using tesseract 3 (which I assume you are).
You’ll have to rebuild your eng.trainddata file.

I intended to replace the word-dawg file completely to try to get better results (ie – the words I’m detecting are always the same).

You’ll need combine_tessdata and wordlist2dawg executables in the training directory when you compile tesseract.

unpack everything (i did this just to back up my eng.word-dawg, you’ll also need the unicharset later)

./combine_tessdata -u eng.traineddata
create a textfile of your wordlist (wordlistfile)
create a eng.word-dawg

./wordlist2dawg wordlistfile eng.word-dawg traineddat_backup/.unicharset
replace the word-dawg file

./combine_tessdata -o eng.traineddata eng.word-dawg

that should be it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am currently working on a project for android using Tesseract OCR. I was

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply