Project

General

Profile

Setting meta data on a xmp sidecar image object (patch & question)

Added by Matthias Baas over 10 years ago

Hi,

I tried to dump all the meta data from a jpg image into a xmp sidecar file using the API (rev 2461 from svn), but got an exception when I tried to set the meta data on the XmpSidecar object. The problem is that Image::setMetadata() always calls the setComment() method, but on the XmpSidecar class, this method will always trigger an exception, no matter if there actually is a comment or not.
I have attached a little patch that changes the XmpSidecar::setComment() method so that it only throws the exception when the comment is not empty.

With the above patch I could copy the data, but this raised another question. The meta data I was copying was from an image taken by a Canon camera, so it contained all that Canon-specific stuff. This extra data was not written into the xmp file which I think is ok as I assume this data is not standardized at all, but if the attempt at setting a comment triggers an error, shouldn't an attempt at setting other unsupported data like the Canon stuff then trigger an error as well? (on the other hand, if it would trigger an error, how would I go about copying just that data that is supported by xmp?)

Cheers,

- Matthias -


Replies (11)

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Andreas Huggel over 10 years ago

It is current practice that set methods of Image subclasses throw an exception if the metadata type is not supported by the particular image. For your case, you could call the other set functions one by one instead of using Image::setMetadata().

Of course one can always discuss what is the correct behavior in such a situation. There are two possible extremes: 1) silently ignore the problem and 2) scream and shout. If I'm not sure I tend to go for 2) until it is clear why 1) (or something in between) is better.

Regarding Canon Makernote tags: your observation is correct, these are not converted to XML automatically. There exist suggestions to write an XMP property with a binary blob for this data, which is controversal and currently not implemented in Exiv2.

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Matthias Baas over 10 years ago

Andreas Huggel wrote:

It is current practice that set methods of Image subclasses throw an exception if the metadata type is not supported by the particular image. For your case, you could call the other set functions one by one instead of using Image::setMetadata().

Of course one can always discuss what is the correct behavior in such a situation. There are two possible extremes: 1) silently ignore the problem and 2) scream and shout. If I'm not sure I tend to go for 2) until it is clear why 1) (or something in between) is better.

I absolutely agree with that. The only question left here might be whether calling setComment("") is an error as xmp files are actually capable of storing no comment. ;) But having it throw in all cases is fine by me as well. I would still argue though that calling setMetadata() should not throw an error if one called clearComment() beforehand on the source data. Especially when the source and target image are both XmpSidecar objects, it doesn't seem intuitive that setMetadata() still throws an error even though the source and target, obviously, are capable of storing exactly the same data.
(But as I can't store all meta data in the xmp file anyway, I have already switched to only store changes and having to merge that with the original data. Initially, I thought I could get away with just reading one single file for getting all the data. So I don't really need the changes from the patch anymore and the above are just my thoughts for getting a cleaner and consistent API in general)

Regarding Canon Makernote tags: your observation is correct, these are not converted to XML automatically. There exist suggestions to write an XMP property with a binary blob for this data, which is controversal and currently not implemented in Exiv2.

I'm fine with the data not being stored, I was only wondering whether this should throw an exception or not. According to what you said above, I take it this actually should throw an exception.

- Matthias -

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Andreas Huggel over 10 years ago

I absolutely agree with that. The only question left here might be whether calling setComment("") is an error as xmp files are actually capable of storing no comment. ;) But having it throw in all cases is fine by me as well. I would still argue though that calling setMetadata() should not throw an error if one called clearComment() beforehand on the source data. Especially when the source and target image are both XmpSidecar objects, it doesn't seem intuitive that setMetadata() still throws an error even though the source and target, obviously, are capable of storing exactly the same data.

You're right and the solution is quite simple: each image format should override the default Image::setMetadata() and implement its own version, depending on the supported metadata types. Or, probably better (easier to maintain), the default implementation in Image::setMetadata() should use Image::checkMode() to make sure it sets only supported metadata.

Regarding Canon Makernote tags: your observation is correct, these are not converted to XML automatically. There exist suggestions to write an XMP property with a binary blob for this data, which is controversal and currently not implemented in Exiv2.

I'm fine with the data not being stored, I was only wondering whether this should throw an exception or not. According to what you said above, I take it this actually should throw an exception.

This is different. Here it depends much more on the data (tags) present in a particular image whether the API works, whereas above it is the programmer calling an API which is not supported for a particular type of metadata (If I disregard the suggestion about empty comments). I think it would be tedious for a programmer to remove all the unsupported tags before writing a sidecar and exiv2 doesn't have any functionality to tell you which tags those are.

Andreas

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Matthias Baas over 10 years ago

Andreas Huggel wrote:

You're right and the solution is quite simple: each image format should override the default Image::setMetadata() and implement its own version, depending on the supported metadata types. Or, probably better (easier to maintain), the default implementation in Image::setMetadata() should use Image::checkMode() to make sure it sets only supported metadata.

Ah, I have overlooked checkMode(). Yes, I think this would already be an improvement.

This is different. Here it depends much more on the data (tags) present in a particular image whether the API works, whereas above it is the programmer calling an API which is not supported for a particular type of metadata (If I disregard the suggestion about empty comments). I think it would be tedious for a programmer to remove all the unsupported tags before writing a sidecar and exiv2 doesn't have any functionality to tell you which tags those are.

ok, I see. And yes, I agree, it would be tedious having to remove unsupported tags manually. On the other hand, the current implementation doesn't give the application a chance to handle unsupported tags in an appropriate way. I think it should be the application's choice whether it wants to ignore them, to issue a warning to the user or even treat this as an error. At the moment, the library silently ignores those tags and the application doesn't get notified about this at all. So ideally, there should be a way for the application to query whether a write operation really has succeeded in writing all data or, alternatively, there could be a query function that the application could call to check whether a particular format supports all the tags the application is going to write. This function could be called before the actual write operation if the application wants to make sure the data will really be stored in the file.
(I haven't had a closer look at the code yet if this could be easily supported or not...)

- Matthias -

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Andreas Huggel over 10 years ago

Created feature #761 for the first issue. If you want to do this, I'd be happy.

I'll think a bit more about the second one and maybe also create an issue for it. Letting an application decide whether to ignore, warn about or raise an exception for unsupported tags is a good point, as is the idea to allow an application to query which tags were actually written. The issue does not only occur with XMP sidecars.

Andreas

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Matthias Baas over 10 years ago

Andreas Huggel wrote:

Created feature #761 for the first issue. If you want to do this, I'd be happy.

Well, adding the checks to setMetadata() looks straightforward, but I also had a look at what checkMode() actually returns and in the registry array all access modes for the xmp sidecar file are set to amNone except for the xmp mode. This means, setMetadata() would then only set xmp data and exif and iptc data would get ignored. But it's not quite true that the xmp sidecar class does not support exif and iptc as it calls those conversion functions to convert the data into xmp data (and vice versa when reading), so in a way, it does support exif and iptc. Wouldn't it then be more appropriate to set the access modes for exif and iptc to amReadWrite in the case of xmp sidecar files, even though it doesn't support all tags?

- Matthias -

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Andreas Huggel over 10 years ago

Yes I agree, they should marked as supported.

Andreas

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Matthias Baas over 10 years ago

I have attached a patch that changes Image::setMetadata() and the access mode flags of the xmp sidecar file. In setMetadata() I'm explicitly testing for amWrite and amReadWrite instead of just testing a single bit. The access mode is given as an enum so I thought it's probably not meant to be used as a bit field.

As to the other issue with reporting unsupported tags, I think this is probably better done in writeMetadata() and not in setMetadata(). writeMetadata() could either return information about what it has written and what it has not written or it could set that data internally and the Image class could have some methods to query that data after writeMetadata() has been called. A new method to check the tags could still be useful though. Such a method could take an output iterator as argument which could receive all tags that are not supported by the image object. The caller can then either report these to the user or remove them before writing the file.

- Matthias -

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Andreas Huggel over 10 years ago

Matthias,

Thanks for the patch and sorry for the late reply. I have checked in the patch (r2466).

The access mode is given as an enum so I thought it's probably not meant to be used as a bit field.

A comment would have been in order I guess. Since each of the access modes has an explicitly assigned value I must have thought of using this as a bit field, otherwise my enums just contain names. I modified your patch accordingly; it's more concise that way.

As for the unsupported tags: Yes, this shouldn't be done in setMetadata. In general it would happen in XyzParser::encode, which are responsible for the serialization of the different metadata lists (see e.g., ExifParser::encode in exif.cpp). In addition, it also happens in the XMP conversion functions. All of these are used in writeMetadata. So as you say, these functions could remember which tags were not set and Image could have a method to query this information.

On the other hand, for Exif data, a function to check which tags can be written to a JPEG image would need to call ExifParser::encode, i.e., be very expensive (XyzParser::encode is the most expensive operation in the entire library). I'm not keen to introduce this - library users would use it without realizing what the impact is.

Andreas

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Matthias Baas over 10 years ago

Andreas Huggel wrote:

A comment would have been in order I guess. Since each of the access modes has an explicitly assigned value I must have thought of using this as a bit field, otherwise my enums just contain names. I modified your patch accordingly; it's more concise that way.

ok, thanks! I also prefer the bit test, but I just wasn't sure whether it was meant to be used that way or not.

As for the unsupported tags: Yes, this shouldn't be done in setMetadata. In general it would happen in XyzParser::encode, which are responsible for the serialization of the different metadata lists (see e.g., ExifParser::encode in exif.cpp). In addition, it also happens in the XMP conversion functions. All of these are used in writeMetadata. So as you say, these functions could remember which tags were not set and Image could have a method to query this information.

Can it really happen that ExifParser::encode() would drop values? After a quick glance at the code, it does drop some values, but at least from those I know, these only seem to be values that would have to be set by an application anyway because they describe some aspects of how the image data is stored (such as StripOffsets and RowsPerStrip).
So isn't it only the XMP conversion functions that almost always drop data (all the makernotes, for example)?

On the other hand, for Exif data, a function to check which tags can be written to a JPEG image would need to call ExifParser::encode, i.e., be very expensive (XyzParser::encode is the most expensive operation in the entire library). I'm not keen to introduce this - library users would use it without realizing what the impact is.

What is it that makes the encode() function that expensive? I noticed that in ExifParser::encode() the first thing that happens is that all the data is copied (I suppose that's because some items may get removed and the input is supposed to be const) and the data may get encoded twice. I haven't had a look at TiffParserWorker::encode() yet...
But anyway, wouldn't the actual bottleneck rather be the disk access instead of the encoding of the data? And also, wouldn't that only be an issue when large amounts of files get batch processed? (so far, I'm still mainly concerned with reading metadata but once writing works as well, I'll do some benchmarks)

- Matthias -

RE: Setting meta data on a xmp sidecar image object (patch & question) - Added by Andreas Huggel over 10 years ago

Can it really happen that ExifParser::encode() would drop values? After a quick glance at the code, it does drop some values, but at least from those I know, these only seem to be values that would have to be set by an application anyway because they describe some aspects of how the image data is stored (such as StripOffsets and RowsPerStrip).

In this context, Exiv2 treats all tags the same. So yes, it does happen, esp. if Exif data from another image format is copied to a JPEG image.

What is it that makes the encode() function that expensive? I noticed that in ExifParser::encode() the first thing that happens is that all the data is copied (I suppose that's because some items may get removed and the input is supposed to be const) and the data may get encoded twice. I haven't had a look at TiffParserWorker::encode() yet...
But anyway, wouldn't the actual bottleneck rather be the disk access instead of the encoding of the data? And also, wouldn't that only be an issue when large amounts of files get batch processed? (so far, I'm still mainly concerned with reading metadata but once writing works as well, I'll do some benchmarks)

I should have said writeMetadata() and everything that is called from it is the expensive portion. Eventually it's the disk access that is expensive, as you suspected. In the case of JPEG images that is done in doWriteMetadata(), for TIFF-like images, it's done in the parser (in TiffParserWorker::encode). TIFF-like images are more critical as they tend to be larger than JPEGs and have their metadata "all over the place", not only in a 64k block near the start of the file.

Andreas

    (1-11/11)