I would like to have an application where a user views an image of a document in TIFF Format.
If the words “foo” and “bar” appear on the page. And a selection is made on the image that only contains “foo”, then I would like to only select the word “foo”.
Is there a format that lends itself to storing both the location of text and the text of an image?
Since you know about searchable PDF, and it perfectly implements what you are suggesting, I assume that there is some reason why you can’t use it. If not, you should use PDF — the format supports mixed-content and overlaying them. All of the viewers that your users are likely to have will understand what to do with text beneath the image.
The TIFF format does not support this directly, but if you are making the viewer, and it only needs to work there, then you could try to store the text and positions in a custom tag.
Then your viewer would need to read this tag, interpret mouse positions, and look up the text that is being selected on the image. No other viewer would support your text tag, but they would show the TIFF.
For either of these mechanisms, you will need OCR and a way to encode the data you get either into PDF or the custom TIFF tag. For open source OCR, take a look at Tesseract from Google.
Disclaimer: I work at Atalasoft. Our imaging SDK, DotImage, has add-ons for OCR that can make searchable PDF, and can add and edit TIFF tags.