when i try to get the text from a document, if it is followed by some special characters such as TM or C (for copyright) and so on, after writing it into a text file it will makes some unexpected added to it. as an example, we can consider the following:
if we have Apache™ Hadoop™! and then if we try to write in into a text using FileOutputStream then result would be like Apacheâ Hadoopâ which the â is nonsense for me and generally i want a way to detect such characters in the text and just skipping them for writing them, is there solution to this?
If you want just the printable ASCII range, then iterate over your string character by character building a new string. Include the character only if it’s within the range
0x20to0x7E.If you want to keep carriage returns and newlines, you also need to consider
0x0Aand0x0D.