Exiv2 and hard links

Added by Anders Kamf over 6 years ago

Hi!

I have some questions regarding how exiv2 handles file write when modyfing the metadata. I have search in mailing lists and documentation without finding any answers and hope someone here can help me out.

The problem I experience is destroyed hard links. My JPG photos are stored as hard links, i.e. I have the same physic photo available from serveral places in my file system. I like these files to be the same and have the same metadata. But when I use exiv2 to update a file the hard link breaks as well as the original file, i.e. all other "instances" of the photo, just becomes an empty file.

  • Example of a file before it is updated with exiv2, as seen the hard link count is 4:
    -rwxrw-r-- 4 anders anders 2.4M 2008-06-12 21:46 IMG_4249.JPG
  • The file after the update, the hard link count is now 1:
    -rwxrw-r-- 1 anders anders 2.4M 2012-01-26 08:14 IMG_4249.JPG
  • Another "instance" of the file after the update, the size is 0 and hard link count decreased to 3:
    -rwxrw-r-- 3 anders anders 0 2012-01-26 08:14 IMG_4249.JPG

My feeling is that exiv2 create a new file and transfer the content of the original file, including the new/edited metadata, to the new file. Then it remove the old, now empty, file and finally it rename the new file to the name of the old one. Hence the broken hard links and emptied files. Is this a somewhat correct description?

So, the first thing I wonder is whether I have interpreted the behavior correct and if so, is this the intended way it should work?
And secondly, is there any way, command option etc. to get around this so that the hard links not break?

Best regards
Anders


Replies (19)

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Hi Anders,

not sure which OS you use but I can confirm this issue on WinXP using latest exiv2 SVN rev. And actually I consider this as a very critical issue. It is not just about breaking the hard link it is about loosing the linked files.

There is a "Check if the file can be written to, if it already exists" in basicio.cpp :: FileIo::transfer :: line 523 that opens the original (and linked) file in write mode and deletes the data. If this check is ignored the hard link is still broken but the original file is unchanged.

There is one possibility to preserve the hard link, ownership etc. Copy the original file to a temporary file and then overwrite the original file (with new metadata) - or in case of error with preserved temporary data. This behaviour could be optional.

Kind regards
Thomas

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

One quick fix preventing data loss is to test with "ab" or "a+b" (should not matter which) instead of "w+b".

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Thanks, I'll check that out as soon as I can

RE: Exiv2 and hard links - Added by Steve Wright over 6 years ago

I was thinking of something along these lines earlier today. I've been steering clear of using Exiv2 to write to hardlinks, because I thought I'd get an error like "The data is of an unknown image type." Now I see that the situation, while not worse, is different. Yes, Exiv2 can find the file the hardlink references, but in the process the link breaks or becomes invalid.

If this is so, I think I'd prefer the "unknown image type" error to the consequence of breaking a link. This doesn't take all that much away from Exiv2's usability anyway, since what one is looking to do when one makes a link and invokes Exiv2 on it is to change is the metadata of the original file in its original location. Making a link of any sort is, then, an extra step. I assume there are times when such a thing is practical or even critical, though none come to mind at the moment.

Just a few thoughts on the matter.

SJW

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Once an alternate (hard link) name is created, there is no chance to tell which is the original name and which is the new name. There are two valid options for exiv2 working on hard links:
  1. Break the link and edit only the file to be changed (see my quick fix above)
  2. Keep the links when editing the metdadata (should be a new option)

RE: Exiv2 and hard links - Added by Anders Kamf over 6 years ago

Hi!

Thanks for the answers and attention to this issue!

Regarding version, I use Ubuntu 10.04 which includes exiv2 version 0.19. My first experience of the destroyed hard links is though from image tagging with digiKam, which uses exiv2 (as I guess you know) to write metadata to the files. A part from the Ubuntu machine I also run digiKam 2.2.0 on Windows7 64bit (not sure which exiv2 version that is) and experience the very same problem there (I have the pictures on a samba share on a Linux server).

From a user point of view, I can only speak for myself of course, but I would prefer that the link somehow can be kept. I see a logic in that, compare to e.g. how a text file that is hard linked is treated from most text editors or the linux system itself when for instance appending to a file ">> output.txt". The Wikipedia description describes my view on hard links quite well:

"This has the effect of creating multiple names for the same file, causing an aliasing effect: e.g. if the file is opened by one of its names, and changes are made to its content, then these changes will also be visible when the file is opened by an alternative name." <en.wikipedia.org/wiki/Hard_link>

Regards
Anders

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Sorry, I still haven't had time to really look into this. It is definitely a severe issue, although apparently not many Exiv2 users use hard links.
I'm thinking of implementing the quick fix Thomas described and a bugfix release (which is due anyway). The cleaner fix - copy the file away and overwrite (edit) the original - is expensive, it would roughly double the time required to write the metadata, so we shouldn't do that in the vast majority of cases where it's not necessary. Will have to play a bit and see how to detect hard links.

Andreas

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Opened bug #812 for this, so that we have a bug number, but let's keep the discussion here.

I finally hatched a fix, checked in with r2658 but only tested it on Linux with an ext3 filesystem so far. Appreciate if you try it with your own setup and feedback here.

Exiv2 now uses "a+b" when checking if a file is writable (that could actually be considered a separate long-standing bug).
To prevent breaking hard links when writing metadata, Exiv2 now always uses a memory buffer instead of a temporary file, if the image has hard links. Previously, memory buffers were only used for small files (up to 1MB).

This is still a workaround, it means you may experience issues with large images that have hard links on a system with limited memory.
Implementing the suggested proper solution that copies the file will require bigger changes, incl. possibly API changes (I don't quite see how to do this cleanly yet).

Andreas

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Hi Andreas,

I needed to add typedef for nlink_t in basicio.cpp

// MSVC doesn't provide mode_t, nlink_t
#ifdef _MSC_VER
typedef unsigned short mode_t;
typedef short nlink_t;
#endif

to make it compile in Visual Studio.

Then st_nlink is always 1 when tested on NTFS drive and WinXP. Hard links are preserved for files less than 1MB size and destroyed for files greater than 1MB size. I guess st_nlink is only a dummy value on Windows platforms. Need to check on Win7.

Regards,
Thomas

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Do we need to use GetFileInformationByHandle and check nNumberOfLinks to determine the number of hardlinks on Windows and NTFS? But if so does that also mean we need to determine the filesystem first?

-ahu.

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Andreas Huggel wrote:

Do we need to use GetFileInformationByHandle and check nNumberOfLinks to determine the number of hardlinks on Windows and NTFS? But if so does that also mean we need to determine the filesystem first?

Yes, this is the recommended way to retrieve the number of hard links on Windows. Never mind the file system. nNumberOfLinks will be 1 on non-NTFS file system. "GetFileInformationByHandle" requires WinXP. I am not sure what the minimum Win OS requirement for exiv2 is. If Win9x or Win2k is still supported "GetFileInformationByHandle" must not be called directly but dynamically loaded from "Kernel32.dll".

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Thomas, could you try the attached patch? (I haven't even compiled it, don't have a Windows development env nearby right now).

Andreas

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Andreas Huggel wrote:

Thomas, could you try the attached patch? (I haven't even compiled it, don't have a Windows development env nearby right now).

It compiles and works as expected on WinXP. Hard links are preserved for files of any size. But be aware that exiv2 will stopp working on Win9x/Win2k. The attached patch does a dynamic load of GetFileInformationByHandle. Could only test on WinXP (32bit) so far where it works as expected.

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Thanks! Since this looks like quite a bit of overhead now, I'm thinking of re-arranging the code and introducing a new function just for the number of hardlinks, since we need that only in one place, not everywhere where we call FileIo::Impl::stat().
How expensive is that dynamic loading? Is it worth treating Win9x/Win2k separately?
And do you know if stat() has been fixed in newer versions of Windows?

Andreas

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

How expensive is that dynamic loading?

Should not be expensive at all as kernel32 is already loaded.

Is it worth treating Win9x/Win2k separately?

I would not recommend to check for OS version. Best check is to see if GetFileInformationByHandle can be found.

And do you know if stat() has been fixed in newer versions of Windows?

I checked on Win7 with Visual Studio 2010 and st_nlink is always 1 regardless of actual number of hard links.

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Thanks for the replies.

I've checked in r2660 which introduces a new function winNumberOfLinks() that is only used on Windows.
It uses the code from Thomas' patch without any changes (except for some additional debug output).
Appreciate if you can test if this works on Windows again.

Andreas

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Yes, it works as expected (using MSVC on WinXP) for arbitrary image size.

RE: Exiv2 and hard links - Added by Thomas Beutlich over 6 years ago

Tiny correction: LoadLibrary("kernel32.dll") compiles only if _UNICODE or UNICODE is not defined. So better is to call LoadLibraryA("kernel32.dll") directly.

RE: Exiv2 and hard links - Added by Andreas Huggel over 6 years ago

Updated, thanks.

(1-19/19)

Redmine Appliance - Powered by TurnKey Linux