I need to search through a word document for a string, and return the "offset" of the first character. What I am unsure about is how to account for newlines. If the document consists of:
Hi
World.
What is the offset of ‘W’ – is it 2, since the offset of ‘i’ is 1? Or is it 3, because the hidden ‘\n’ could be considered a character? What if the document is using ‘\r\n’ carriage returns? Is there a standard way to deal with this (Java)?
The answer is normalization: