other changes

taylor's picture
Offline
Joined: 06/02/2008
Juice: 16
other changes

I also made a lot of other changes to my code to get the google base feed working properly. For one, check_plain() is a poor function to use because it doesn't actually strip out the bad characters that google base doesn't support in the xml feed. Rather, it converts them to other characters that google base doesn't support. So really use of that function is an epic fail. I searched long and hard to find a good function that will strip out all the characters perfectly every time and stumbled across some code under GPL on the zen-cart forums in this thread: http://www.zen-cart.com/forum/showthread.php?t=67850&page=134

I've tested the code and it works beautifully. You need to replace calls to check_plain with google_base_xml_sanitizer. Smiling
Code:

    function google_base_xml_sanitizer($str, $cdata = false) {
      $_strip_search = array("![\t ]+$|^[\t ]+!m",'%[\r\n]+%m'); // remove CRs and newlines
      $_strip_replace = array('',' ');
      $_cleaner_array = array(">" => "> ", "®" => "", "®" => "", "™" => "", "â„¢" => "", "\t" => "", "    " => "");
      $str = html_entity_decode($str);
      $str = strtr($str, $_cleaner_array);
      $str = preg_replace($_strip_search, $_strip_replace, $str);
      $str = strip_tags($str);
      $str = eregi_replace("[^[:alnum:][:space:].,!()'-_/+=?äÂÄöÖüÜß]", "", $str);
      $str = utf8_encode(htmlentities($str));
      $str = str_replace('&', '&', $str);
      $str = str_replace(array("®", "©", "™"), array('(r)', '(c)', '(tm)'), $str);
      $out = "";
      $length = strlen($str);
      for ($i = 0; $i < $length; $i++) {
        $current = ord($str{$i});
        if (($current == 0x9) || ($current == 0xA) || ($current == 0xD) || (($current >= 0x20) && ($current <= 0xD7FF)) || (($current >= 0xE000) && ($current <= 0xFFFD)) || (($current >= 0x10000) && ($current <= 0x10FFFF))) {
          $out .= chr($current);
        } else {
          $out .= " ";
        }
      }
      $str = trim($out);
      $str = str_replace(array("<", ">"), array("<", ">"), $str);
      if ($cdata) {
        $str = '';
      } 
      return $str;
    }

I made a lot of other minor tweeks to help optimize my feed. I made the static across my products - because I have a lot of taxonomy associated with my products and so the default of splitting the taxonomy for the term didn't work out well for rankings.

This module is now functional but in order to be more useful I suggest that we build a few more things into this module. We should have fields on the nodes of each product to be able to add attributes to each product for things like "Brand" (much in the way that google base integration does.

Another thing I did was create an exclusion list so that you could specify which products to leave out of the feed:

$exclusions = array('product1', 'product2', etc...)

   if(!in_array($product_path, $exclusions))
   {
    // Build google base rss
     $output .= "\n";

OH and I almost forgot. I removed the line with g:department because that is a bogus product field. I suppose it could be modified to c:department (custom field)... but it is not an actual google product field - so I removed it entirely.

Hope this helps! If anyone is interested in having me be a sponsor or maintainer on this package shoot me a pm... I feel like there is still a lot of room for improvement with this module.

Google Base RSS Feed, 2.3.1 By: danbaker (40 replies) Sat, 06/20/2009 - 12:36