Exception: XMP JPEG segment is larger than 65535 bytes
I am one of the developers of darktable (www.darktable.org) and new in this forum.
We are using exiv2 for our metadata management. This includes saving our development 'history stacks'
as XMP tags into exported files. Works very well!
Our next feature release will introduce masks which means that the size of XMP tags can get significantly larger,
covering several 10 kbytes, and exceeding the segment limit of JPEGS. Exiv2 will throw the above exception.
My question: is there a way to overcome this limit? To my understanding there is in principle a chance
to split tags over several segments. That seems to be the approach how large ICC profiles are typically handled.
Any chance for this?
We've recently had a discussion on topic 1608 about JPG segments: http://dev.exiv2.org/boards/3/topics/1608 And there is a document on our Wiki about the format of JPG files. http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_JPEG_files
No single APP13 block can be more than 65535 bytes, however Exiv2 does support multiple blocks. Off hand I don't know if XMP is stored in an APP13 block.
Can you attach an example of the data you wish to add. It would be very helpful if you could present the data as an exiv2(.exe) command-line and I will investigate/debug and get back to you about this.
RE: Exception: XMP JPEG segment is larger than 65535 bytes - Added by Ulrich Pegelow about 8 years ago
thanks for having a look into this issue.
As you requested I attached an example. File large-history-stack.xmp is a sidecar file typical for the ones darktable produces - a large one indeed.
When I try to insert the XMP tags into a file large-history-stack.jpg (lacking an XMP segment) I get the following:
exiv2 -iX large-history-stack.jpg large-history-stack.jpg: Could not write metadata to file: Size of XMP JPEG segment is larger than 65535 bytes
I think we might be in difficulty with this. I had a read at this document: http://search.cpan.org/~bettelli/Image-MetaData-JPEG-0.153/lib/Image/MetaData/JPEG/Structures.pod#Structure_of_an_XMP_APP1_segment
You will see that it says:
XMP APP1 segments are made up by an identifier and a Unicode XMP packet (the encoding is usually UTF-8, but it can also be UTF-16 or UTF-32, both big-endian or little-endian). The packet cannot be split in multiple segments, so there is a maximum size of approximately 64KB. The structure is very simple: a fixed XMP namespace URI (null terminated, and without quotation marks) followed by the XMP packet: …..
Exiv2 can handle multiple APP1 segments. However you have a single XMP packet which is longer than 64K. The XMP Spec http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf on p93 states:
IMPORTANT: Following the normal rules for JPEG sections, the header plus the following
data can be at most 65535 bytes long. The XMP Packet cannot be split across
the multiple APP1 sections, so the size of the XMP Packet can be at most
I've looked for the word 'compression' in the spec in the hope of being able to compress the packet. The 71k file 'large-history-stack.xmp' is only 15K when zipped. I didn't find anything to enable the packet to be compressed.
Clearly, it is possible to reduce the size of the packet by elimination of white space in the XML and using single-letter namespace identifiers. For example <rdf:xxx> could be shortened to <r:xxx>. For the file you have sent, you might get lucky and reduce 71949 to 65000. I believe the utility xmllint can do this for you.
I don't know what a 'history stack' is. Perhaps it's possible to flate-compress the strings in darktable:blendop_params.
I don't have a solution for you. However I'm happy to continue to discuss this and we may discover a work-around.
RE: Exception: XMP JPEG segment is larger than 65535 bytes - Added by Ulrich Pegelow about 8 years ago
thanks for your feedback!
So it looks like in principle we can not guarantee that any arbitrary sized XMP block fits into a JPEG. Your suggestion to compress the data is probably the way to go in order to reduce the problem pressure. In darktable's case there is a certain type of data that consumes most of the space - vector data describing blend masks. Compressing these data would be the first step.
RE: Exception: XMP JPEG segment is larger than 65535 bytes - Added by Michael Ulbrich about 8 years ago
Hi Robin, Ulrich,
there is a document "XMP Specification Part3: Storage in Files" from July 2010 which is available for download on the net.
It states in section 184.108.40.206 "Extended XMP in JPEG":
"Following the normal rules for JPEG sections, the header plus the following data can be at most 65535 bytes
long. If the XMP packet is not split across multiple APP1 sections, the size of the XMP packet can be at most
65502 bytes. It is unusual for XMP to exceed this size; typically, it is around 2 KB.
If the serialized XMP packet becomes larger than the 64 KB limit, you can divide it into a main portion
(StandardXMP) and an extended portion (ExtendedXMP), and store it in multiple JPEG marker segment. A
reader must check for the existence of ExtendedXMP, and if it is present, integrate the data with the main XMP.
Each portion (standard and extended) is a fully formed XMP metadata tree, although only the standard portion
contains a complete packet wrapper. If the data is more than twice the 64 KB limit, the extended portion can
also be split and stored in multiple marker segments; in this case, the split portions are not fully formed
In practice I haven't seen any example of XMP split into Standard- and ExtentedXMP yet.
Would be interesting to check, if any of the standard image or metadata editors actually support this kind of split XMP over multiple APP1 segments.
Best regards ... Michael
I have seen a number of examples of extended XMP. Adobe products write this, and ExifTool has supported extended XMP since Oct. 26, 2008 (it was added to the XMP specification Oct. 17, 2008).
Yes, this is also used for example for writing depthmap data into images, as described here:
This is an example of such an image with extended xmp data:
exiftool is able to read all the data:
Blur At Infinity : 0.013350779
Focal Distance : 16.577824
Focal Point X : 0.5208333
Focal Point Y : 0.58125
Format : RangeInverse
Near : 12.423587799072266
Far : 390.539306640625
Mime : image/png
Has Extended XMP : E0CC0C923EA0770C77E3E4EF99F538B3
Data : (Binary data 268496 bytes, use -b option to extract)
exiv2 doesn't see the "Data":
exiv2 -PX table.jpg
Xmp.GFocus.BlurAtInfinity XmpText 11 0.013350779
Xmp.GFocus.FocalDistance XmpText 9 16.577824
Xmp.GFocus.FocalPointX XmpText 9 0.5208333
Xmp.GFocus.FocalPointY XmpText 7 0.58125
Xmp.GImage.Mime XmpText 10 image/jpeg
Xmp.GDepth.Format XmpText 12 RangeInverse
Xmp.GDepth.Near XmpText 18 12.423587799072266
Xmp.GDepth.Far XmpText 16 390.539306640625
Xmp.GDepth.Mime XmpText 9 image/png
Xmp.xmpNote.HasExtendedXMP XmpText 32 E0CC0C923EA0770C77E3E4EF99F538B3
The version of exiv2(.exe) that ships with v0.25 has an option -pX to extract the XMP as "raw" xml. That code understands Xmp.xmpNote.HasExtendedXMP. However the current version the XMPsdk which is statically linked into libexiv2 does not understand HasExtendedXMP. I believe XMPsdk will be ungraded in v0.26 and will handle this correctly.
$ exiv2 -pX ~/Downloads/table.jpg <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:GImage="http://ns.google.com/photos/1.0/image/" xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/" GImage:Data="/9j/4AAQSkZJRgABAQAAAQABAAD...deleted...//2Q==" GDepth:Data="iVBORw...deleted...CYII="/> </rdf:RDF> </x:xmpmeta> $ 507 rmills@rmillsmbp:~ $ exiv2 -pX ~/Downloads/table.jpg | wc 9 16 521029 $The output from exiv2 -PX ~/Downloads/table.jpg remains the same as you reported above. As you correctly state "exiv2 doesn't see the "Data".
There is however something odd about this sample image. As I understand the specification, the Xmp in the "default" segment should be repeated in the Extended chain of segments. Clearly Xmp.GFocus etc are only in the default segment. I believe the data should be repeated in the Extended chain to off-load the reader from merging xml from the default and extended chain.
From the XMP specification:
"A reader must check for the existence of ExtendedXMP, and if it is present, integrate the data with the main XMP."
"When ExtendedXMP is required, the metadata must be split according to some algorithm that assigns more important data to the main portion, and less important data to the extended portions or portions."
I have seen several commits in #922. Do they only change the way the exiv2 command line tool works, or will we also get support for extended XMP when using libexiv2? Glancing over the code it seem it's part of the core XMP handling, but I would like to get a confirmation on that.
The option -pX will extract XMP (and Extended XMP). However the version of XMPsdk in our code base remains stuck in the dark ages before Extended XMP. Issue #941 is to upgrade our XMPsdk to use the latest Adobe code and to treat XMPsdk as an external library. Regrettably, this has been deferred. Andreas assures me that it is a considerable undertaking to implement #941.
One of the command-line changes in support of #922 (and others) is to support the tgt '-' to mean stdin/stdout. This enables metadata to be piped between files. So the option:
-pX will print XMP. The option:
-eX- will do the same thing. The use-case is something like:
$ exiv2 -eX- foo.jpg | exiv2 -iX- bla.jpgIn addition to X for XMP, I've added C for ICC profile.
I'm a little overloaded at the moment with a major home renovation project. I've submitted the code for a lot of the
-i- stuff, however I need to more thoroughly test and document this and to add tests to test suite. If you have time to spare, I'd appreciate some help with this effort.
Too bad, I guess I have to read the JPEG APP1 tags myself then, assemble them manually and feed the result to exiv2. Thanks for the quick answer though. :-)
I'm not sure what you're doing, however you may find the option
-pR useful for the analysis of files.
-pS prints the structure of a file.
-pR recursively prints the structure of the file. By recursive, I mean that it will use the tiff structure printer on embedded tiff structures.
If you're writing scripts to manipulate the APPx segment of a file, the combination of
-pS and the utility dd are powerful medicine for locating/extracting/replacing APPx segments. I believe this is discussed in the Wiki in the articles about the structure of various file formats. It has also be discussed in several forum discussions which you can probably find by searching the forum.
I am not using the command line tool at all, but libexiv2 and its C++ API. I am, just like Ulrich, a developer of darktable.
Yes, I am describing features of the command-line tool. However these things are implemented in the API, so you may be able to take advantage of the work I've put into these features.