uc_importer.module: htmlentities() and 2-byte characters

Posts: 9
Joined: 01/05/2008

The problem is in detail described here.
In a module sourcecode (uc_importer.module) there is such line:

$xml. = ' <name> '. htmlentities ($term-> name). ' </name> ';

And some bit more similar... It so is available in a code of other modules UC. Strange, but in others including standard modules it is not used htmlentities() anywhere.

From the documentation on PHP: "... Presently, the ISO-8859-1 character set is used as the default ..."
Thus, it is function perceives each not-latin char as on two bytes and tries to transform them to something ugly:).
The decision - to specify the coding obviously (everywhere). For example:

$xml. = ' <name> '. htmlentities ($term-> name, ENT_QUOTES, "UFT-8"). ' </name> ';

Sorry, i can't provide patch. I am newbie in CVS and diff:)

--

Nosce te ipsum

Posts: 2352
Joined: 08/07/2007
AdministratoreLiTe!

That's alright. I didn't know that htmlentities would convert the encoding. I've changed all of those occurrences of htmlentities() as well as the corresponding html_entity_decode() to use Unicode. Also, the HTTP header "Content-Type" includes "charset=utf-8" and the XML header is set to have UTF-8 encoding. That should cover everything.