Project

General

Profile

Issue with XMP encoding ? "data of an unknown image type"

Added by Michael Friess almost 12 years ago

Background:
I try to import my photo library to digiKam 1.0 under Linux (Ubuntu 9.10). The library is maintained with iMatch (http://www.photools.com) under Windows. iMatch stores XMP as sidecar files for raw file formats.
My JPEG files get imported fine but I dont get metadata for the raw files.
Trying to locate the problem I stumbled across the following error:
exiv2 /media/Photos/New_Photos/2009051_Lake_District/20090503_4466.XMP
Exiv2 exception in print action for file /media/Photos/New_Photos/2009051_Lake_District/20090503_4466.XMP:
/media/Photos/New_Photos/2009051_Lake_District/20090503_4466.XMP: The file contains data of an unknown image type

Randomly I discovered that exiv2 works fine, once I opened the XMP file in Kate and saved it ?!
So I did a diff:
diff b 20090503_4466.XMP /media/Photos/New_Photos/2009051_Lake_District/20090503_4466.XMP
1c1
< <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Public XMP Toolkit Core 3.5">
--

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Public XMP Toolkit Core 3.5">

The lines seem to be identical though exiv2 can only handle one file.

So hexdump for both files reveals different encoding:
hexdump -n 48 /media/Photos/New_Photos/2009051_Lake_District/20090503_4466.XMP
0000000 bbef 3cbf 3a78 6d78 6d70 7465 2061 6d78
0000010 6e6c 3a73 3d78 6122 6f64 6562 6e3a 3a73
0000020 656d 6174 222f 7820 783a 706d 6b74 223d
0000030

hexdump -n 48 20090503_4466.XMP
0000000 783c 783a 706d 656d 6174 7820 6c6d 736e
0000010 783a 223d 6461 626f 3a65 736e 6d3a 7465
0000020 2f61 2022 3a78 6d78 7470 3d6b 5022 6275
0000030

Apparently iMatch added a Byte Order Mark (BOM) to the UTF-8 stream. Kate removed the BOM and changed the byte order. diff works fine in interpreting the lines correctly. exiv2 has an issue with the BOM.

Reading further: http://en.wikipedia.org/wiki/Byte_order_mark
While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may nonetheless be encountered. A UTF-8 BOM is explicitly allowed by the Unicode standard2, but is not recommended3, as it only identifies a file as UTF-8 and does not state anything about byte order.[4] Many Windows programs (including Windows Notepad) add BOMs to UTF-8 files by default. However in Unix-like systems (which make heavy use of text files for file formats as well as for inter-process communication) this practice is not recommended, as it will interfere with correct processing of important codes such as the shebang at the start of an interpreted script.[5] It may also interfere with source for programming languages that don't recognise it. For example, gcc reports stray characters at the beginning of a source file, and in PHP, if output buffering is disabled, it has the subtle effect of causing the page to start being sent to the browser, preventing custom headers from being specified by the PHP script. The UTF-8 representation of the BOM is the byte sequence EF BB BF, which appears as the ISO-8859-1 characters  in most text editors and web browsers not prepared to handle UTF-8.

Should exiv2 gracefully handle the BOM?


Replies (3)

RE: Issue with XMP encoding ? "data of an unknown image type" - Added by Andreas Huggel almost 12 years ago

Should exiv2 gracefully handle the BOM?

Recognising an XMP file that has a leading BOM can be done easily. It requires a small change in isXmpType() in case you want to do it yourself. The BOM will be removed on write though.

Andreas

RE: Issue with XMP encoding ? "data of an unknown image type" - Added by Michael Friess almost 12 years ago

I take your offer as a compliment that I could do it myself. Knowing about programming issues doesnt necessarily mean that I am comfortable with programming. I havent touched C++ for at least a decade ;-)

Though does this indicate that I might be lucky and some decent contributor will fix this easily?

"The BOM will be removed on write though." - I would actually expect this based on the aforementioned Unicode standard recommendation.

RE: Issue with XMP encoding ? "data of an unknown image type" - Added by Andreas Huggel almost 12 years ago

This is now feature #673 and will be in the next release. See r2010 for details or if you want to brush up your C++ ;)

Cheers,
Andreas

    (1-3/3)