import image local path fix

Project:Ubercart Contributions
Component:Code
Category:
Priority:normal
Assigned:Unassigned
Status:by design
Description
Project: 
Ubercart

Hi,

This is for version alpha-7c.

To get the images to import correctly using an XML file generated using the export feature I had to make a couple changes to the uc_importer.module.

There were two issues
1) if $local_path is meant to be an absolute path to the image file then it should not be stored in the database. A relative path such as $image_path supplied from the XML file should be.

2) I believe file_directory_path() and file_create_path() are supposed to return an absolute path such as 'c:/wamp/www/drupal-5.2/files'. Both functions return 'files' on my system. I do not know why those functions are returning incorrect data. It could be windows and my freshly downloaded wamp stack. I added another check to make sure $local_path is in fact an absolute path.

I've also got some code which updates the files table so new images and new products with new images can be added. I'll make a patch for that as well once I test it a bit.

-C

PreviewAttachmentSize
uc_importer_local_image_path.zip680 bytes
Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: import image local path fix

I checked the API pages of file_directory_path() and file_create_path() just to be sure, and it looks like they return relative paths. This should make $local_path relative to either the files/ directory, or the Drupal root directory.

It may not be clear from the documentation, but $image_path is supposed to be an absolute path pointing to an image resource, typically from whatever system had the images in the first place. The importer copies the image from $image_path to $local_path if the file doesn't exist locally.

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
relative or absolute image path for import
Assigned to:Lyle» cdiggity

Hi Lyle,

The image-path field in the XML file generated by the export function is a relative path so I assumed it would be for the import function. Relative paths seem to be what are stored in the files table as well.

I think the uc_importer module should be able to import the XML files it generates with the export function.

Therefore either:
a) export relative paths in the XML file and expect relative paths in the image-path field when importing an XML file
b) export absolute paths and expect absolute paths in image-path field when importing
c) treat paths starting with '/' as absolute and others as relative to the drupal directory. Don't know how this works on windows.

and update http://www.ubercart.org/docs/user/354/importing_data to let us know which it is.

Personally I think the paths should all be relative to the drupal directory but I don't know what the drupal convention is.

-C

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: relative or absolute image path for import
Assigned to:cdiggity» Lyle

OK, I didn't realize where the confusion was coming from.

I'm going with option (b) because the importer is using the image path to download the image from its location to the Drupal filesystem. Then, the new path is saved as a relative path per the Drupal conventions.

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
spaces in URL translated to + but not translated back
Assigned to:Lyle» cdiggity

Hi Lyle,

The uc_importer.module in the repository is now using an URL as the absolute path to an image. However it is unable to import its own export file for images with spaces in their names.

<image>
<path>http://localhost/drupal-5.2/files/cool+shot+of+PV.JPG</path>
<alt>cool shot of PV.JPG</alt>
<title>cool shot of PV.JPG</title>
</image>

There seem to be two cases:
1) looking for the file "cool+shot+of+PV.JPG" instead of "cool shot of PV.JPG" in the $local_path variable. We aren't going to find "cool+shot+of+PV" using file_exists().

2) The file is going to be copied to $local_path with the function 'file_put_contents()'. fopen() has trouble with the URL-encoding:

    * warning: fopen(http://localhost/drupal-5.2/files/taranaki+copy_0.jpg) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\wamp\www\drupal-5.2\modules\ubercart\uc_importer\uc_importer.module on line 866.
    * warning: fopen(http://localhost/drupal-5.2/files/cool+shot+of+PV.JPG) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\wamp\www\drupal-5.2\modules\ubercart\uc_importer\uc_importer.module on line 866.

I'm not sure how you would like to proceed. You could try and sort out the URL encoding/decoding keeping in mind that people will be generating the .XML files from various sources and will have to generate these properly URL-encoded image paths. Another option would be to use absolute file paths but be restricted to importing images from the local disk instead of anywhere on the web.

I'll be on the lookout for your decision and subsequent bazaar checkin.

-cdigs

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: spaces in URL translated to + but not translated back
Assigned to:cdiggity» Lyle

I think all that needs to happen is to put a urldecode() around basename($image_path) on line 868 (probably not the line on your copy). In PHP 5, fopen should handle HTTP requests, and therefore the URL encoding. If you don't have PHP 5, I'm not too sure what should be done. (Besides convincing your host to upgrade.)

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
Hi Lyle, I have 5.2.4 on
Assigned to:Lyle» cdiggity

Hi Lyle,

I have 5.2.4 on apache 2.24 win32.

Before my previous post I had tried:

       $image_URL = (string)$image->path;   // URL-encoded for http via fopen
       $image_path = urldecode($image_URL); // for direct filesystem access
...
       $local_path = file_create_path() .'/ubercart_images';
       $local_path .= '/'. basename($image_path);
       if (file_exists($local_path)
          || $size = file_put_contents($local_path, fopen($image_URL, 'rb'))){
...

and that is what generated these errors (same as my previous post)

    * warning: fopen(http://localhost/drupal-5.2/files/taranaki+copy_0.jpg) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\wamp\www\drupal-5.2\modules\ubercart\uc_importer\uc_importer.module on line 866.
    * warning: fopen(http://localhost/drupal-5.2/files/cool+shot+of+PV.JPG) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\wamp\www\drupal-5.2\modules\ubercart\uc_importer\uc_importer.module on line 866.

Note that when I paste the two URLS above in my web browser with the + instead of space I get a drupal page not found page in my web browser.

Giving fopen the urldecoded string yields these errors:

    * warning: fopen(http://localhost/drupal-5.2/files/taranaki copy_0.jpg) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\wamp\www\drupal-5.2\modules\ubercart\uc_importer\uc_importer.module on line 866.
    * warning: fopen(http://localhost/drupal-5.2/files/cool shot of PV.JPG) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\wamp\www\drupal-5.2\modules\ubercart\uc_importer\uc_importer.module on line 866.

Pasting either of these two URLs into my web browser and the picture is loaded, but the URL in the taskbar has the spaces replaced with %20.

I have "clean urls" enabled. Is mod_rewrite clouding the issue? Why is the drupal URL encoding using + instead of %20 for space?

Here are the get requests from my apache log:

127.0.0.1 - - [26/Sep/2007:18:54:45 -1000] "GET /drupal-5.2/files/cool%20shot%20of%20PV.JPG HTTP/1.1" 200 164119
127.0.0.1 - - [26/Sep/2007:19:03:35 -1000] "GET /drupal-5.2/files/taranaki+copy_0.jpg HTTP/1.0" 404 3555
127.0.0.1 - - [26/Sep/2007:19:03:38 -1000] "GET /drupal-5.2/files/cool+shot+of+PV.JPG HTTP/1.0" 404 3555
127.0.0.1 - - [26/Sep/2007:19:03:33 -1000] "POST /drupal-5.2/admin/store/products/import HTTP/1.1" 302 -
127.0.0.1 - - [26/Sep/2007:19:03:40 -1000] "GET /drupal-5.2/admin/store/products/import HTTP/1.1" 200 10771

What is interesting is that in the last two lines I post to the import script to give it the XML filename and then there is the GET request to my webbrowser but the fopen() function doesn't seem to actually make an http request to the webserver.

It would probably be easier for people to create the XML import files if you use file paths instead of url encoded urls but I don't care which it is so long as it works.

-C

Ryan's picture
Offline
Joined: 08/07/2007
Juice: 15422
Re: Hi Lyle, I have 5.2.4 on
Assigned to:cdiggity» Ryan

(This is a side note, but it would seem better for usability and web standards to convert your images to use underscores or hyphens instead of spaces. That should avert the whole problem, right?)

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
averting is not fixing
Assigned to:Ryan» cdiggity

Not having spaces or other non alphanumeric characters in the image file name would prevent this bug from occurring, but not fix it. For usability the importer module should be able to handle any filename that an image could have.

Using URLs as the path to the images opens this URL-encoding can of worms. Using a file path avoids it.

The initial problem wasn't with URL encoding, it was that the exporter exported relative file paths and the importer expected absolute file paths. I'd prefer file paths instead of URLs.

-C

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: averting is not fixing
Assigned to:cdiggity» Lyle

Actually, if you're making the XML with your own script, you can put file paths if you want. And if the images are going to be on the same filesystem anyway, that's a better idea.

The exporter will probably still use the HTTP protocol to specify the file path because I've used it to transfer images from one server to another. I think that's a handy thing not to have to figure out beforehand which images will actually be used.

It looks like this is a problem with Drupal that I have to work around: http://drupal.org/node/157409. I'll just make the URL the hard way, the correct way.

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
good point
Assigned to:Lyle» cdiggity

That is a good point that we will be able to use absolute filepaths or URLs in the image-path field when generating the product catalogue from sources other than the ubercart export function.

I have checked out from bzr and will test again.
-C

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
Hi Lyle, Sorry this turned

Hi Lyle,

Sorry this turned out kinda long but I was experiencing some funny behaviour trying to get our catalog updates imported with all the images so I narrowed the import file down to a couple products and went over it case by case. Using version 'alpha-7e'.

I found that

  1. If you import products that already exist, they lose their images
  2. If you are importing your images from the drupal files directory, they will have "_0" appended to the filename. In the files table the filepath column will refer to the imagefile with the _0 in the filename whereas the filename field will refer to the imagefile without the _0.
  3. If you import the same file twice, the second time the images will be placed in the 'ubercart_images' directory instead of the 'files' directory but no images show up for products. In the files table no images paths ever contained the 'ubercart_images' directory as part of the path

The import script will import images correctly only if the products don't already exist in the database.

Here are the steps I went through:

After testing the import/export routine on "Overwrite existing item" with the 'unique hash' and 'id' fields generated by export routine in the XML file this is what I have found:

  1. drupal 'files' directory empty, ubercart_images directory exists & empty. no images in files table, no products in database.
    • Importing using absolute file path outside wwwroot for image: OK
    • Importing using URL on local server outside drupal directory: OK
    • importing using URL on remote server: OK

    Note however that the images were copied to the files directory and the ubercart_images directory remains empty! I don't think this is the desired behaviour given the code below.

            if (module_exists('imagefield')){
              foreach ($product_data->image as $image){
                $image_path = (string)$image->path;
                $path_info = pathinfo($image_path);
                $local_path = file_create_path() .'/ubercart_images/'. $_SERVER['HTTP_HOST'];
                if (!file_check_directory($local_path)){
                  $local_path = file_create_path() .'/ubercart_images';
                }
                $local_path .= '/'. rawurldecode(basename($image_path));
                if (file_exists($local_path) || $size = file_put_contents($local_path, fopen($image_path, 'rb'))){
                  $product->field_image_cache[$i] = array(
                    'fid' => (file_exists($local_path) ? db_result(db_query("SELECT fid FROM {files} WHERE nid = %d AND filepath = '%s'", $nid, $local_path)) : 'upload'),
                    'title' => isset($image->title) ? html_entity_decode((string)$image->title) : '',
                    'alt' => isset($image->alt) ? html_entity_decode((string)$image->alt) : '',
                    'filename' => basename($image_path),
                    'filepath' => $local_path,
                    'filesize' => (file_exists($local_path) ? @filesize($local_path) : $size),
                    'filemime' => 'image/'. $path_info['extension'],
                  );
                  $i++;
                }
              }
            }
  2. Importing the same file again except: the products already exist from the previous import, All images are now in drupal 'files' directory and are correctly referenced in 'files' table.
    • No images appear for any products on any pages, no img tags are generated. The images become disassociated with the products as when I delete the products the images are no longer removed from the files directory or the files table.
    • the files table is unchanged: all images are listed with correct filenames and NIDs.
    • All of image files have been copied to the ubercart_images directory this time!
  3. Now with the product fields 'unique hash' and undocumented 'id' removed to simulate importing a catalog from a source other than the export script. No products in the database, files and ubercart_images directories empty
    same as case 1) above, all OK.
  4. performing the same import again except all products already exist in database and image files are in the 'files' directory.
    same as 2) above. Images don't show up for products and they have been copied to ubercart_images directory.
  5. Above all of the images were stored outside of the drupal directory. This time the images accessed by absolute file path are in the drupal 'files' directory as are the images accessed by a URL to local machine. So the images already exist on the filesystem even though they don't exist in the database.
    Images stored in drupal files directory, no unique_hash field in xml file, no files or products in the database.
    • files were copied to the files directory with _0 appended to the filename.
    • references to the images in the files table have _0 in the filename in the filepath field, but the filename field has no _0.
    • ubercart_images empty.

    If the file already exists, why does it create a new copy?

  6. running the same import again, now with images with _0 in their filename in the files directory, ubercart_images directory is empty.
    same as 2) and 4) above. images copied to ubercart_images directory and no images appear on pages.
Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: Hi Lyle, Sorry this turned
Assigned to:cdiggity» Lyle

I made some slight changes to that code, and I'd like to know if it helps or not. Update from the bazaar repository and test it out. I ran out of time to do it today, but I'll be running some test of my own tomorrow.

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: Re: Hi Lyle, Sorry this turned

After a lot more work than I expected, I think I've got the importer to work like it should. I've tried to make the image fields use the ubercart_images directory programmatically, but site administrators should check to make sure they are set to anyway.

I was trying to use boolean shortcuts to only download the files when they didn't exist, but I think they were being copied anyway. Or it might have been something CCK was doing. Whichever it is, I've gotten to the point where duplicate images don't happen anymore. Hopefully it'll be the same for you.

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
testing
Assigned to:Lyle» cdiggity

I'll checkout and give this a thorough test again tomorrow. Thanks lyle.
C

cdiggity's picture
Offline
Bug Finder
Joined: 09/14/2007
Juice: 40
Re: testing

Just checked out from bzr.

Between each of these tests I deleted all the products and ensured the files table was empty (it always was).

  1. If the ubercart_images directory does not exist, it is not created and no images are uploaded.
  2. The images do not go into 'ubercart_images' directory if it does exist.
  3. My image file with spaces in the name was saved on the hard drive and in the database as 'Cool%20Shot.JPG' and not 'Cool Shot.JPG' when the image path was specified as a URL. Imagecache will not generate product-list etc images for it. upper/lower case extension makes no difference. It is difficult to export it as imagecache can't produce a thumbnail. When I manually create a product, the image is saved on disk as 'Cool Shot.JPG' and then it all works.
  4. If the path in the XML file for an image with a space in the name is an absolute path and not an URL-Encoded URL then everything works and the above is not a problem.
  5. If the space in the image file name is not URL-Encoded and the path is an URL, fopen can not open the file stream. this may be an issue with file_put_contents?
  6. as noted elsewhere, if two different products use the same image, two different image files are created, one with _0 appended to the filename.

Making progress! Re-importing products that already exist no longer result in the images being re-uploaded with _0 on the filename.

-C

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: Re: testing
Assigned to:cdiggity» Lyle

1 and 2 are related to issues I'm having with imagefield. You can specify the path the filed uses to save the images it gets, and I've tried to set it to "ubercart_images" automatically, but it doesn't seem to be obeying the database in that instance. Manually setting the imagefields to use "ubercart_images" should fix that problem. I'd hate to make this a necessary step in the installation but that might be what it comes to.

5 is a non-issue. If the path is a URL, then it must be URL-encoded. To fix 3 and 4, I think I can decode the filename so that the local path has the spaces like it should.

6 is just how imagefield works. The standard file upload does the same thing.

Lyle's picture
Offline
AdministratoreLiTe!
Joined: 08/07/2007
Juice: 6841
Re: Re: Re: testing

I think I've taken care of the ubercart_images problem. I've got the product module set up so that the ubercart_images folder is created only if the files folder has been set. Even still, the imagefield can be set to use any folder in the file system, so the importer had to be flexible enough to deal with it. I've put in a commit today that should fix it.