OCR texts often have words that flow from one line to another with a hyphen at the end of the first line. (ie: the word has ‘-\n’ inserted in it).
I would like rejoin all such split words in a text file (in a linux environment).
I believe this should be possible with sed or awk, but the syntax for these is dark magic to me! I knew a text editor in windows that did regex search/replace with newlines in the search expression, but am unaware of such in linux.
If the file has windows line endings, you’ll need to catch the
cr-lfwith something like: