I’m coding a email application that produces messages for sending via SMTP. That means I need to change all lone \n and \r characters into the canonical \r\n sequence we all know and love. Here’s the code I’ve got now:
CRLF = '\r\n'
msg = re.sub(r'(?<!\r)\n', CRLF, msg)
msg = re.sub(r'\r(?!\n)', CRLF, msg)
The problem is it’s not very fast. On large messages (around 80k) it takes up nearly 30% of the time to send a message!
Can you do better? I eagerly await your Python gymnastics.
This regex helped:
re.sub(r'\r\n|\r|\n', '\r\n', msg)But this code ended up winning:
msg.replace('\r\n','\n').replace('\r','\n').replace('\n','\r\n')The original regexes took .6s to convert /usr/share/dict/words from \n to \r\n, the new regex took .3s, and the replace()s took .08s.