Today I stumbled about a Problem which seems to be a bug in the Zend-Framework. Given the following route:
<test> <route>citytest/:city</route> <defaults> <controller>result</controller> <action>test</action> </defaults> <reqs> <city>.+</city> </reqs> </test>
and three Urls:
- mysite.local/citytest/Berlin
- mysite.local/citytest/Hamburg
- mysite.local/citytest/M%FCnchen
the last Url does not match and thus the correct controller is not called. Anybody got a clue why?
Fyi, where are using Zend-Framework 1.0 ( Yeah, I know that’s ancient but I am not in charge to change that :-/ )
Edit: From what I hear, we are going to upgrade to Zend 1.5.6 soon, but I don’t know when, so a Patch would be great.
Edit: I’ve tracked it down to the following line (Zend/Controller/Router/Route.php:170):
$regex = $this->_regexDelimiter . '^' . $part['regex'] . '$' . $this->_regexDelimiter . 'iu';
If I change that to
$this->_regexDelimiter . 'i';
it works. From what I understand, the u-modifier is for working with asian characters. As I don’t use them, I’m fine with that patch for know. Thanks for reading.
The problem is the following:
From Handling UTF-8 with PHP. Therefore it’s actually irrelevant if your URL is ISO-8859-1 encoded (mysite.local/citytest/M%FCnchen) or UTF-8 encoded (mysite.local/citytest/M%C3%BCnchen), the default regex won’t match.
I also made experiments with umlauts in URLs in Zend Framework and came to the conclusion that you wouldn’t really want umlauts in your URLs. The problem is, that you cannot rely on the encoding used by the browser for the URL. Firefox (prior to 3.0) for example does not UTF-8 encode URLs entered into the address textbox (if not specified in about:config) and IE does have a checkbox within its options to choose between regular and UTF-8 encoding for its URLs. But if you click on links within a page both browsers use the URL in the given encoding (UTF-8 on an UTF-8 page). Therefore you cannot be sure in which encoding the URLs are sent to your application – and detecting the encoding used is not that trivial to do.
Perhaps it’s better to use transliterated parameters in your URLs (e.g. change Ä to Ae and so on). There is a really simple way to this (I don’t know if this works with every language but I’m using it with German strings and it works quite well):