i’m trying to read a file into my python program and apply tokenizer on it to split the text into a set of sentences. However, in my output i’m getting the ‘/n’ character that i’d like to avoid in the output, as it might hinder my further processes on the sentences.
I read the input using the read() command. Also tried readline(). i’m still getting the newline characters on my output. Any suggestions on avoiding this?
file_sent = open(path,'r')
all_sents = file_sent.read()
sent_all = print all_sents
tokenized_sents = sent_tokenize(sent_all)
If you want to remove the newlines entirely:
If you want to replace them with spaces:
Obviously you could replace them with something else if you wanted.