I am trying to take unicode and clean it to be used for URLs.
Examples : “Bird’s Milk” Cake OR Pão com Ovo
In converting these, my goal is to make them as human readable as possible so, the urls following those examples would be – /birds-milk-cake/ or /pao-com-ovo/
To get the ASCII of the accented characters,
title = 'Pão com Ovo'
title = unicodedata.normalize('NFKD', title).encode('ascii','ignore')
However I am wondering what the best solution is for removing characters like # ! ‘ ” ( ) &. Normalize() errors on those characters so is there a proper way for removing those characters while retaining the accented characters?
1 Answer