Importer ampersand problem

Posts: 42
Joined: 05/29/2008

I'm having problem with Importer module. When importing file includes & (ampersand), the import always fail. Basically as soon as script hit first ampersand letter the import will stop. All data writen before first ampersand apearance is imported ok, but after that the importer just stops - current xml tag ends in last letter before ampersand, and no remainding tags is imported. What is more strange it doesnt even write any aditional error.
Any thoughts?

Posts: 2244
Joined: 08/07/2007
AdministratoreLiTe!

Replace all instances of & with &. If you're using PHP to construct your XML, the function htmlspecialchars() will do this. It'd be best if you specify the encoding as UTF-8, though, so for every textual tag do:

<?php
  $tag_data
= htmlspecialchars($tag_data, ENT_QUOTES, "UTF-8");
?>

Posts: 42
Joined: 05/29/2008

I'm sorry for false alarm, I think I little missunderstand my problem. I did some research and the problem was that not all ampersand appearances caused error but basically anything except &amp, &lt, &gt, &apos, &quot did. In my import file is used latin entities like for example á (&aacute;) which is defined in w3.org entities (http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent).

Probably not the best solution but I'm preparing script to transform entities from its name to its definition (&aacute; -> &#225;)

Posts: 2244
Joined: 08/07/2007
AdministratoreLiTe!

Ah, I see. That's part of the reason why the encoding should be UTF-8. Then you can just put the characters in as they are.

XML is not the same as XHTML, so it only defines that short list of entities on its own. The numbered entities should still work though.

Posts: 54
Joined: 08/08/2007

In my XML I left the characters as they were (without entity names). I just added encoding <?xml version="1.0" encoding="utf-8"?> at the beginning of the file.

Before importing make sure your XML file validates in the browser and everything looks correct there.

Posts: 42
Joined: 05/29/2008

Much to learn about XML for me, thats true Smiling

Anyway, this simple script helped in my case.

AttachmentSize
entities-replace.php_.txt5.82 KB
Posts: 42
Joined: 05/29/2008

futurist wrote:
In my XML I left the characters as they were (without entity names). I just added encoding <?xml version="1.0" encoding="utf-8"?> at the beginning of the file.

Before importing make sure your XML file validates in the browser and everything looks correct there.

Problem in my case was that I'm importing data already formated in way that instead of for example letter "á", there is "&aacute;". In this case adding <?xml version="1.0" encoding="utf-8"?> doesnt work (or maybe I dont know how to make it work Smiling).