Exiv2 & Unicode Paths on Windows

Added by Arnold Wiegert 18 days ago

When compiling exiv2lib for debugging, the output library is already 'decorated' with a trailing 'd'.
Once Unicode versions become available, it would be equally desirable to also identify those specific libraries with a trailing 'u'.


Replies (8)

RE: Exvi2 & Unicode - Added by Robin Mills 18 days ago

I have code to submit for '', 'u', 'd', 'ud' and it seems to be working. Hope to submit it later this week.

Another approach would be to only support UNICODE on Windows because it provides a superset of the API. It takes nothing away, it adds wstring path functions to the api. http://dev.exiv2.org/boards/3/topics/2913?r=2921#message-2921

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 18 days ago

'Windows only' would work for me; it is all I really work with and the proposed decorations are what I am used to.

After reading parts of the message you referenced, I would like to add that is is not just file names & paths that need to be handled as UNICODE.
The metadata withing the images can also contain UNICODE strings.

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 18 days ago

The aim here is to fix/guarantee that the UNICODE path code in Windows builds correctly with all Windows build systems (and Windows environment such as Cygwin and msys/2.0).

Exiv2 v0.26.1 is a 'dot' release and should be a 'drop in replacement' for v0.26. Changing the default libexiv2.dll to provide UNICODE functions is probably harmless, however we are changing the ABI should not do that for a 'dot' release. Let's get more experience/feedback with the UNICODE path code. We can consider making UNICODE the default build in Exiv2 v0.27.

The subject of handling UNICODE in the metadata is another matter. Being a native English speaker, I am rather challenged by localisation. I believe Exif tags (with string values) are binary which we can treat as UTF-8. I believe IPTC is similar. It's a byte count and some bytes. You've already mentioned that you have an UTF-8 issue with XMPsdk. I'm willing to investigate UTF-8 issues with Exif, IPTC and XMP metadata if you can provide test cases.

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 18 days ago

Handling Unicode path is certainly essential to allow use of exiv2 under Windows, irrespective of the compiler etc.

How it is handled on the code and build end, is entirely up to you. I have no backwards or other compatibility constraints.
As for localization, until I got involved in this aspect of metadata, it was only a word in the dictionary and even now I am very much a newb myself.

But, since the image metadata standards all support Unicode strings in one way or other, in the end then exiv2 would need to be able to handle those as well, else path compatibility is nice, but not everything ;-)

I'll try to attach a couple of rather simplistic test files.
The first one, itext2.png is an unmodified copy of the PNG test file itxt2.png.
It contains some French text with one accented character.

The second one, itxt-german.png is a slightly modified version of one of the PNG test images.
IIRC, it was modified using pngcheck and contains a string in German with a couple of German characters, one 'Umlaut' & one sharp 's'

http://www.libpng.org/pub/png/apps/pngcheck.html

itxt2.png (5.35 KB)

itxt-german.png (5.4 KB)

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 18 days ago

Arnold

We're off topic. I'm working flat-out on the Exiv2 v0.26.1 release - including your UNICODE/static/Debug/CMake code. And there are other matters on the TODO list that has been promised in v0.26.1 I don't have the bandwidth at present to start a totally different subjects such as UNICODE in metadata.

However, I've looked briefly at the files you sent.

599 rmills@rmillsmbp:~/Downloads $ exiv2 -pa ~/Downloads/itxt2.png 
600 rmills@rmillsmbp:~/Downloads $ exiv2 -pR ~/Downloads/itxt2.png 
STRUCTURE OF PNG FILE: /Users/rmills/Downloads/itxt2.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ...[...E.....                  | 0x52edaae4
      33 | gAMA  |       4 | ....                           | 0x0bfc6105
      49 | sBIT  |       4 | ....                           | 0x4da52df6
      65 | bKGD  |       6 | ......                         | 0x95cd2f20
      83 | pCAL  |      44 | bogus units...........foo/bar  | 0x57407b1c
     139 | pHYs  |       9 | .........                      | 0xd2dd7efc
     160 | tIME  |       7 | .....:.                        | 0x8eff267a
     179 | tEXt  |       9 | Title.PNG                      | 0xdc017935
     200 | iTXt  |      39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
     251 | zTXt  |      26 | test.....N.)-Qx.P.........     | 0xa869e99d
     289 | IDAT  |    4828 | x..\+........EFFb...X$.......D | 0x353e27e9
    5129 | zTXt  |     202 | Description....M.Mj.@...>.[... | 0xa9a15024
    5343 | iTXt  |     111 | Warning...de.WARNING........0. | 0xb3adddee
    5466 | IEND  |       0 |                                | 0xae426082
601 rmills@rmillsmbp:~/Downloads $ exiv2 -pa ~/Downloads/itxt2.png 
602 rmills@rmillsmbp:~/Downloads $ which exiv2
/usr/local/bin/exiv2
603 rmills@rmillsmbp:~/Downloads $ exiv2 -pR ~/Downloads/itxt-german.png 
STRUCTURE OF PNG FILE: /Users/rmills/Downloads/itxt-german.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ...[...E.....                  | 0x52edaae4
      33 | gAMA  |       4 | ....                           | 0x0bfc6105
      49 | sBIT  |       4 | ....                           | 0x4da52df6
      65 | pCAL  |      44 | bogus units...........foo/bar  | 0x57407b1c
     121 | tIME  |       7 | .....:.                        | 0x8eff267a
     140 | bKGD  |       6 | ......                         | 0x95cd2f20
     158 | pHYs  |       9 | .........                      | 0xd2dd7efc
     179 | tEXt  |       9 | Title.PNG                      | 0xdc017935
     200 | iTXt  |      39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
     251 | IDAT  |    4000 | x..\+........EFFb...X$.......D | 0x199b9fd7
    4263 | IDAT  |     831 | .......#T..6.....`....G...(<.. | 0xa028a770
    5106 | zTXt  |     202 | Description..x.M.Mj.@...>.[... | 0x692a52f9
    5320 | iTXt  |     111 | Warning...de.WARNING.x......0. | 0xb5cd3e90
    5443 | iTXt  |      65 | Deutsch...de..Steinstra..e 10, | 0x23616b74
    5520 | IEND  |       0 |                                | 0xae426082
604 rmills@rmillsmbp:~/Downloads $
As you can see, exiv2 does not list any metadata in the files. However the debugging code exiv2 -pR foo.png spots iTXt and zTXt blocks with Title, Author, Description. Exiv2 deals with Exif, IPTC, XMP and ICC metadata. This is a totally new species of metadata in PNG files. And that's another subject that could be investigated.

However, my focus is Exiv2 v0.26.1. These new subjects:

  1. UNICODE in metadata
  2. Metadata iXTt/zTXt blocks in PNG files

have never been supported by Exiv2. I'd like to retain focus and stack those other subjects for a future project. I recommend that you open new Feature requests and they will be investigated at a future time.

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 18 days ago

Understood.

Because I am not sure how to exactly describe the issues, I have opened 2 feature request and hope that you or someone else on the team can flesh out the nitty gritty details.

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 18 days ago

PS: any support for Unicode strings by exiv2 will be useful.
I am by no means tied to iTxt or zTxt chunks - they were just handy examples files
Support via XMP or any other 'standard' would be quite workable for me.

As an example, I have added some UTF-8-Unicode strings to a sample jpg image using XnViewMP
XnViewMP reports the data as being placed in IPTC-IIM & XMP

IPTC-IIM Keywords : Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ

XMP
dc     subject[1] Places, 
       subject[2] Germany
       subject[3] Baden-Württemberg
       subject[4] UmlautßüöäÜÖÄ
lr
       hierarchical subject[1] Places
       hierarchical subject[2] Places|Germany
       hierarchical subject[3] Places|Germany|Baden-Württemberg
       hierarchical subject[4] Places|Germany|UmlautßüöäÜÖÄ

Exiftool reports the data as:
Subject : Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ
Hierarchical Subject : Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ

metadata-test.jpg (70.6 KB)

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 18 days ago

Well, here's what I see on the Mac:

623 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ exiv2 -px ~/Downloads/metadata-test.jpg
Xmp.dc.subject               XmpBag      4  Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ
Xmp.lr.hierarchicalSubject   XmpBag      4  Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ
624 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ 
You can get the "raw" XMP/xml from the file with:
625 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ exiv2 -pX ~/Downloads/metadata-test.jpg
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.5.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about="" 
            xmlns:dc="http://purl.org/dc/elements/1.1/" 
            xmlns:lr="http://ns.adobe.com/lightroom/1.0/">
         <dc:subject>
            <rdf:Bag>
               <rdf:li>Places</rdf:li>
               <rdf:li>Germany</rdf:li>
               <rdf:li>Baden-Württemberg</rdf:li>
               <rdf:li>UmlautßüöäÜÖÄ</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Places</rdf:li>
               <rdf:li>Places|Germany</rdf:li>
               <rdf:li>Places|Germany|Baden-Württemberg</rdf:li>
               <rdf:li>Places|Germany|UmlautßüöäÜÖÄ</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
To be honest, I don't know what I'm looking for. The "raw" XMP/xml has nothing that says how it's encoded.

This can be viewed with samples/exiv2json:

630 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ bin/exiv2json  ~/Downloads/metadata-test.jpg 
{
    "Iptc": {
        "Envelope": {
            "CharacterSet": "G" 
        },
        "Application2": {
            "Keywords": "Places",
            "Keywords": "Germany",
            "Keywords": "Baden-Württemberg",
            "Keywords": "UmlautßüöäÜÖÄ" 
        }
    },
    "Xmp": {
        "dc": {
            "subject": "Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ" 
        },
        "lr": {
            "hierarchicalSubject": "Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ" 
        },
        "xmlns": {
            "dc": "http:\/\/purl.org\/dc\/elements\/1.1\/",
            "lr": "http:\/\/ns.adobe.com\/lightroom\/1.0\/" 
        }
    }
}

It's been a long day (12 hours of solid work on Exiv2). Tomorrow is another day. I beg you to stop talking about UNICODE metadata and other types of png/metadata. I'm working hard on build matters for Exiv2 v0.26.1.

(1-8/8)

Redmine Appliance - Powered by TurnKey Linux