Web frameworks such as Rails and Django has built-in support for “slugs” which are used to generate readable and SEO-friendly URLs:
A slug string typically contains only of the characters a-z, 0-9 and - and can hence be written without URL-escaping (think “foo%20bar”).
I’m looking for a Perl slug function that given any valid Unicode string will return a slug representation (a-z, 0-9 and -).
A super trivial slug function would be something along the lines of:
$input = lc($input),
$input =~ s/[^a-z0-9-]//g;
However, this implementation would not handle internationalization and accents (I want ë to become e). One way around this would be to enumerate all special cases, but that would not be very elegant. I’m looking for something more well thought out and general.
My question:
- What is the most general/practical way to generate Django/Rails type slugs in Perl? This is how I solved the same problem in Java.
The
slugifyfilter currently used in Django translates (roughly) to the following Perl code:Since you also want to change accented characters to unaccented ones, throwing in a call to
unidecode(defined inText::Unidecode) before stripping the non-ASCII characters seems to be your best bet (as pointed out by phaylon).In that case, the function could look like:
The former works well for strings that are primarily ASCII, but falls short when the entire string is formed of non-ASCII characters, since they all get stripped out, leaving you with an empty string.
Sample output:
Note how 北亰 gets slugifies to nothing with the Django-inspired implementation. Note also the difference the NFC normalization makes — liberté becomes ‘liberta’ with NFKD after stripping out the second part of the decomposed character, but would becomes ‘libert’ after stripping out the re-assembled ‘é’ with NFC.