UTF-8 is hell. I have run into problems with UTF-8 before (see Migrating MySQL to UTF-8 encoding).
I moved a site that was displaying fine from an old Apache server to a new Apache2 server and quickly identified that the Apache2 server was displaying odd characters.
The site is running a PHP application with a MySQL backend. The MySQL database is using latin1_swedish_ci character encoding. The old site was displaying correctly. Both the new and old site are using the same database.
So what was going on?
I was running into an issue with a vserver where the disk space usage was at 23% but I was getting a 'No space left on device' error message. I run vserver with 12 virtual servers and none of the other servers exhibited the same behavior.
Recently I came into a situation where a former employee had used a gateway marshaling concept and created multiple directories with the same file names (eg. www/admin/index.php. www/admin_v2/index.php). Debugging the code became problematic due to long filenames when using __FILE__. The long file paths were polluting the debug output and making it difficult to read.
I wrote a little function to return the file path based on a root directory specification. For instance if the script filename (aka __FILE__) is '/really/long/and/hard/to/read/path/www/index.php' the function will return 'www/index.php'.
This is useful for debugging calls (I use PEAR::Log) when using techniques such as:
I have always had trouble reliably and dynamically finding what my application root was in PHP. There seemed no way to effectively determine what the application root was on the fly. I always keep my application root separate from my www root for security considerations. This knocks out using $_SERVER['DOCUMENT_ROOT'].
Consider the following directory structure:
/path/to/domain.com
--> app/
--> includes/main.php
--> www/
--> index.php
--> admin/index.php