Generally speaking, I don’t like wasting time hunting down framework bugs. It’s not that I don’t like a challenge, it’s just that they are usually hard and very time consuming to fix (if at all possible), especially in case you have to deal with fringe cases (in other words, “Google isn’t going to help”).
Since we have lots of clients that are not limited to english-speaking markets only, Unicode has been a way of life for us, supporting it since we got off Windows 9x and moved on to Windows 2000. It’s always frustrating to see technology that doesn’t play nice with Unicode, especially considering that this is 2011, where everything is social and interconnected and having multilingual thingies is beyond cool; it’s a necessity.
I’ve stated before that despite its flaws, WordPress is an amazing product with a strong community and a plethora of available content that allows you to meet customer demands quickly and effectively (gone are the days of overpriced custom CMS solutions, especially since the global banks’ crisis began). Till now, the only real WordPress related problem we faced had to do with unicode permalinks, which to put simply…. just wouldn’t work with IIS: If your permalink contained unicode characters, WordPress would complain that it couldn’t find your post. There was no plugin that could solve the issue and as such, we usually resorted to ugly hacks in order to tackle it.
When a rather demanding client absolutely required fully unicode (and working) permalinks we had to find a real solution. As you may know, when you enable WordPress’s permalinks under IIS 7.x, it creates a web.config file (serving the same purpose as Apache’s .htaccess) that specifies a custom rewrite rule for Microsoft’s official Url Rewrite Module. The rule is rather simple, it just forces every incoming extensionless request to be rewritten as a request to index.php. At that point I wondered how WordPress knew about my original request since no other information seemed to be passing along with the rewrite rule. True, Append Querystring was selected, but that was rather irrelevant to the specific problem.
It was rather obvious that the original request (path really) was preserved somehow in a server variable. There were 4 candidates, PATH_INFO, ORIG_PATH_INFO, PATH_TRANSLATED and REQUEST_URI. I echo‘ed their outputs for a rewritten request (such as /2011/02/Δοκιμή-Test/). The first three were no good, since they were either empty or only contained the name of the target file (after the rewrite). The last one was interesting:
/2011/20/������-Test/
What the…? What happened here is that either PHP or the URL Rewriting module interpreted the result as being encoded in the system’s locale, in our case el-gr. Since URL Rewriting produces perfect results with .net applications (which means that the resulting string is utf-8 encoded rather than ISO-8859-7), I can safely assume that PHP is to blame here. I dug through php.ini but I couldn’t find anything relevant (but do let me know if I’m wrong). Then I came across this little gem: http://forums.iis.net/t/1152957.aspx. See that UNENCODED_URL parameter that’s introduced with Url Rewriting Module? I tested that one and it produced the following result:
/2011/20/%CE%94%CE%BF%CE%BA%CE%B9%CE%BC%CE%AE-Test/
This looks like a URL encoded form of the original url, right? Actually, no. This IS the original form of the URL. According to the spec, only a very limited character set is allowed on a URL, not exceeding the length of the ASCII table in fact. Your browser performs the url encoding before the request is sent to the server. In other words, the rewrite rule that’s set by WordPress to URL Rewriting module (and I suspect others might too) is naively implemented, since it doesn’t take into account the unicode nature of the Url Rewriting Module. Fortunately, I have two solutions for you:
1st solution: Rewrite REQUEST_URI with UNENCODED_URL
This requires some work on your part and access to your server’s configuration. First, you must go to IIS manager and under the target site select URL Rewrite. Then choose View Server Variables. You want to add REQUEST_URI here, then return. That’s all there is to it here. The significance of this step is that you grant access to the rewrite rules to alter the REQUEST_URI server variable (by default that’s not allowed).
Note! You can also perform that on the main server configuration in order to allow the specific server variable to be editable from within a rewrite rule. I doubt there are any realistic security implications but you may want to avoid such a drastic step anyway. Also, while I’m sure that there’s a way to do that with appcmd.exe, I haven’t investigated it but feel free to drop me a line if you know how.
After you do that, you can open your web.config file which will probably look something like this:
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rules> <rule name="wordpress" patternSyntax="Wildcard" stopProcessing="false"> <match url="*" /> <conditions> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> </conditions> <action type="Rewrite" url=index.php" appendQueryString="true" /> </rule></rules> </rewrite> </system.webServer> </configuration>
Add the following…
<serverVariables>
<set name=”REQUEST_URI” value=”{UNENCODED_URL}” replace=”true” />
</serverVariables>
… as shown below:
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rules> <rule name="wordpress" patternSyntax="Wildcard" stopProcessing="false"> <match url="*" /> <conditions> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> </conditions> <action type="Rewrite" url="index.php" appendQueryString="true" /> <serverVariables> <set name="REQUEST_URI" value="{UNENCODED_URL}" replace="false" /> </serverVariables> </rule></rules> </rewrite> </system.webServer> </configuration>
What we’ve done here is to instruct IIS to replace the REQUEST_URI server variable with the UNENCODED_URL one whenever the rewrite rule is used. That’s the first solution I used and worked with no issues from WordPress or its plugins (even caching ones). Caveat emptor: Some plugins, especially caching related ones might include their own rewriting rules; they might even replace or override the ones set by WordPress. In that case, the steps are probably the same, but the web.config file will be different. Just remember to append the above snippet to the proper position as shown above (within a <rule></rule> tag).
2nd solution: Alter the header from within PHP
This solution is much simpler and will probably work with shared hosting solutions where you don’t have access to the server’s configuration. All you have to do, is go to WordPress’s index.php file and add the following line after the first line (i.e after the line with the <?php statement):
$_SERVER[‘REQUEST_URI’] = $_SERVER[‘UNENCODED_URL’];
And… that’s it! Keep in mind though that should you decide to upgrade your WordPress installation, you must ensure that the above line was not removed after the upgrade, in which case you should manually add it again. The index.php file rarely (if ever) changes, but I am not certain whether it’s overwritten on an upgrade or not.
Have fun with your all new properly working unicode permalinks!
6 comments for “WordPress 3.x on IIS 7.x with Unicode Permalinks – Problem SOLVED!”