6 replies [Last post]
svihel's picture
Offline
Joined: 05/29/2008
Juice: 140
Was this information Helpful?

I'm having problem with Importer module. When importing file includes & (ampersand), the import always fail. Basically as soon as script hit first ampersand letter the import will stop. All data writen before first ampersand apearance is imported ok, but after that the importer just stops - current xml tag ends in last letter before ampersand, and no remainding tags is imported. What is more strange it doesnt even write any aditional error.
Any thoughts?

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6846
Re: Importer ampersand problem

Replace all instances of & with &. If you're using PHP to construct your XML, the function htmlspecialchars() will do this. It'd be best if you specify the encoding as UTF-8, though, so for every textual tag do:

<?php
  $tag_data
= htmlspecialchars($tag_data, ENT_QUOTES, "UTF-8");
?>
svihel's picture
Offline
Joined: 05/29/2008
Juice: 140
Re: Re: Importer ampersand problem

I'm sorry for false alarm, I think I little missunderstand my problem. I did some research and the problem was that not all ampersand appearances caused error but basically anything except &amp, &lt, &gt, &apos, &quot did. In my import file is used latin entities like for example á (&aacute;) which is defined in w3.org entities (http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent).

Probably not the best solution but I'm preparing script to transform entities from its name to its definition (&aacute; -> &#225;)

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6846
Re: Re: Re: Importer ampersand problem

Ah, I see. That's part of the reason why the encoding should be UTF-8. Then you can just put the characters in as they are.

XML is not the same as XHTML, so it only defines that short list of entities on its own. The numbered entities should still work though.

futurist's picture
Offline
Joined: 08/08/2007
Juice: 114
Re: Re: Re: Re: Importer ampersand problem

In my XML I left the characters as they were (without entity names). I just added encoding <?xml version="1.0" encoding="utf-8"?> at the beginning of the file.

Before importing make sure your XML file validates in the browser and everything looks correct there.

svihel's picture
Offline
Joined: 05/29/2008
Juice: 140
futurist wrote:In my XML I
futurist wrote:

In my XML I left the characters as they were (without entity names). I just added encoding <?xml version="1.0" encoding="utf-8"?> at the beginning of the file.

Before importing make sure your XML file validates in the browser and everything looks correct there.

Problem in my case was that I'm importing data already formated in way that instead of for example letter "á", there is "&aacute;". In this case adding <?xml version="1.0" encoding="utf-8"?> doesnt work (or maybe I dont know how to make it work Smiling).

svihel's picture
Offline
Joined: 05/29/2008
Juice: 140
Re: Re: Re: Re: Importer ampersand problem

Much to learn about XML for me, thats true Smiling

Anyway, this simple script helped in my case.

AttachmentSize
entities-replace.php_.txt 5.82 KB