I’m trying to figure out how to output hOCR using Tesseract. Documentation is limited, so I am looking into the code. I found this in the main() function:
bool output_hocr = tessedit_create_hocr;
outfile = argv[2];
outfile += output_hocr ? ".html" : tessedit_create_boxfile ? ".box" : ".txt";
A typical command for Tesseract is this: tesseract input.tif output_file.txt (the output file will be appended with another .txt in this example). main()’s signature is int main(int argc, char **argv).
What exactly is the code snippet doing?
It’s generating the output filename.
Saves the tessedit_create_hocr flag in a locally scoped variable.
Initializes the outfile variable with the base filename from the command line. Something like “Scann0000.tif”.
Appends the appropriate extension based on flags. Could be re-written as