Exiv2 & Unicode Paths on Windows

Added by Arnold Wiegert 10 months ago

When compiling exiv2lib for debugging, the output library is already 'decorated' with a trailing 'd'.
Once Unicode versions become available, it would be equally desirable to also identify those specific libraries with a trailing 'u'.


Replies (17)

RE: Exvi2 & Unicode - Added by Robin Mills 10 months ago

I have code to submit for '', 'u', 'd', 'ud' and it seems to be working. Hope to submit it later this week.

Another approach would be to only support UNICODE on Windows because it provides a superset of the API. It takes nothing away, it adds wstring path functions to the api. http://dev.exiv2.org/boards/3/topics/2913?r=2921#message-2921

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 10 months ago

'Windows only' would work for me; it is all I really work with and the proposed decorations are what I am used to.

After reading parts of the message you referenced, I would like to add that is is not just file names & paths that need to be handled as UNICODE.
The metadata withing the images can also contain UNICODE strings.

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 10 months ago

The aim here is to fix/guarantee that the UNICODE path code in Windows builds correctly with all Windows build systems (and Windows environment such as Cygwin and msys/2.0).

Exiv2 v0.26.1 is a 'dot' release and should be a 'drop in replacement' for v0.26. Changing the default libexiv2.dll to provide UNICODE functions is probably harmless, however we are changing the ABI should not do that for a 'dot' release. Let's get more experience/feedback with the UNICODE path code. We can consider making UNICODE the default build in Exiv2 v0.27.

The subject of handling UNICODE in the metadata is another matter. Being a native English speaker, I am rather challenged by localisation. I believe Exif tags (with string values) are binary which we can treat as UTF-8. I believe IPTC is similar. It's a byte count and some bytes. You've already mentioned that you have an UTF-8 issue with XMPsdk. I'm willing to investigate UTF-8 issues with Exif, IPTC and XMP metadata if you can provide test cases.

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 10 months ago

Handling Unicode path is certainly essential to allow use of exiv2 under Windows, irrespective of the compiler etc.

How it is handled on the code and build end, is entirely up to you. I have no backwards or other compatibility constraints.
As for localization, until I got involved in this aspect of metadata, it was only a word in the dictionary and even now I am very much a newb myself.

But, since the image metadata standards all support Unicode strings in one way or other, in the end then exiv2 would need to be able to handle those as well, else path compatibility is nice, but not everything ;-)

I'll try to attach a couple of rather simplistic test files.
The first one, itext2.png is an unmodified copy of the PNG test file itxt2.png.
It contains some French text with one accented character.

The second one, itxt-german.png is a slightly modified version of one of the PNG test images.
IIRC, it was modified using pngcheck and contains a string in German with a couple of German characters, one 'Umlaut' & one sharp 's'

http://www.libpng.org/pub/png/apps/pngcheck.html

itxt2.png (5.35 KB)

itxt-german.png (5.4 KB)

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 10 months ago

Arnold

We're off topic. I'm working flat-out on the Exiv2 v0.26.1 release - including your UNICODE/static/Debug/CMake code. And there are other matters on the TODO list that has been promised in v0.26.1 I don't have the bandwidth at present to start a totally different subjects such as UNICODE in metadata.

However, I've looked briefly at the files you sent.

599 rmills@rmillsmbp:~/Downloads $ exiv2 -pa ~/Downloads/itxt2.png 
600 rmills@rmillsmbp:~/Downloads $ exiv2 -pR ~/Downloads/itxt2.png 
STRUCTURE OF PNG FILE: /Users/rmills/Downloads/itxt2.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ...[...E.....                  | 0x52edaae4
      33 | gAMA  |       4 | ....                           | 0x0bfc6105
      49 | sBIT  |       4 | ....                           | 0x4da52df6
      65 | bKGD  |       6 | ......                         | 0x95cd2f20
      83 | pCAL  |      44 | bogus units...........foo/bar  | 0x57407b1c
     139 | pHYs  |       9 | .........                      | 0xd2dd7efc
     160 | tIME  |       7 | .....:.                        | 0x8eff267a
     179 | tEXt  |       9 | Title.PNG                      | 0xdc017935
     200 | iTXt  |      39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
     251 | zTXt  |      26 | test.....N.)-Qx.P.........     | 0xa869e99d
     289 | IDAT  |    4828 | x..\+........EFFb...X$.......D | 0x353e27e9
    5129 | zTXt  |     202 | Description....M.Mj.@...>.[... | 0xa9a15024
    5343 | iTXt  |     111 | Warning...de.WARNING........0. | 0xb3adddee
    5466 | IEND  |       0 |                                | 0xae426082
601 rmills@rmillsmbp:~/Downloads $ exiv2 -pa ~/Downloads/itxt2.png 
602 rmills@rmillsmbp:~/Downloads $ which exiv2
/usr/local/bin/exiv2
603 rmills@rmillsmbp:~/Downloads $ exiv2 -pR ~/Downloads/itxt-german.png 
STRUCTURE OF PNG FILE: /Users/rmills/Downloads/itxt-german.png
 address | chunk |  length | data                           | checksum
       8 | IHDR  |      13 | ...[...E.....                  | 0x52edaae4
      33 | gAMA  |       4 | ....                           | 0x0bfc6105
      49 | sBIT  |       4 | ....                           | 0x4da52df6
      65 | pCAL  |      44 | bogus units...........foo/bar  | 0x57407b1c
     121 | tIME  |       7 | .....:.                        | 0x8eff267a
     140 | bKGD  |       6 | ......                         | 0x95cd2f20
     158 | pHYs  |       9 | .........                      | 0xd2dd7efc
     179 | tEXt  |       9 | Title.PNG                      | 0xdc017935
     200 | iTXt  |      39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
     251 | IDAT  |    4000 | x..\+........EFFb...X$.......D | 0x199b9fd7
    4263 | IDAT  |     831 | .......#T..6.....`....G...(<.. | 0xa028a770
    5106 | zTXt  |     202 | Description..x.M.Mj.@...>.[... | 0x692a52f9
    5320 | iTXt  |     111 | Warning...de.WARNING.x......0. | 0xb5cd3e90
    5443 | iTXt  |      65 | Deutsch...de..Steinstra..e 10, | 0x23616b74
    5520 | IEND  |       0 |                                | 0xae426082
604 rmills@rmillsmbp:~/Downloads $
As you can see, exiv2 does not list any metadata in the files. However the debugging code exiv2 -pR foo.png spots iTXt and zTXt blocks with Title, Author, Description. Exiv2 deals with Exif, IPTC, XMP and ICC metadata. This is a totally new species of metadata in PNG files. And that's another subject that could be investigated.

However, my focus is Exiv2 v0.26.1. These new subjects:

  1. UNICODE in metadata
  2. Metadata iXTt/zTXt blocks in PNG files

have never been supported by Exiv2. I'd like to retain focus and stack those other subjects for a future project. I recommend that you open new Feature requests and they will be investigated at a future time.

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 10 months ago

Understood.

Because I am not sure how to exactly describe the issues, I have opened 2 feature request and hope that you or someone else on the team can flesh out the nitty gritty details.

RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert 10 months ago

PS: any support for Unicode strings by exiv2 will be useful.
I am by no means tied to iTxt or zTxt chunks - they were just handy examples files
Support via XMP or any other 'standard' would be quite workable for me.

As an example, I have added some UTF-8-Unicode strings to a sample jpg image using XnViewMP
XnViewMP reports the data as being placed in IPTC-IIM & XMP

IPTC-IIM Keywords : Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ

XMP
dc     subject[1] Places, 
       subject[2] Germany
       subject[3] Baden-Württemberg
       subject[4] UmlautßüöäÜÖÄ
lr
       hierarchical subject[1] Places
       hierarchical subject[2] Places|Germany
       hierarchical subject[3] Places|Germany|Baden-Württemberg
       hierarchical subject[4] Places|Germany|UmlautßüöäÜÖÄ

Exiftool reports the data as:
Subject : Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ
Hierarchical Subject : Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ

metadata-test.jpg (70.6 KB)

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 10 months ago

Well, here's what I see on the Mac:

623 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ exiv2 -px ~/Downloads/metadata-test.jpg
Xmp.dc.subject               XmpBag      4  Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ
Xmp.lr.hierarchicalSubject   XmpBag      4  Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ
624 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ 
You can get the "raw" XMP/xml from the file with:
625 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ exiv2 -pX ~/Downloads/metadata-test.jpg
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.5.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about="" 
            xmlns:dc="http://purl.org/dc/elements/1.1/" 
            xmlns:lr="http://ns.adobe.com/lightroom/1.0/">
         <dc:subject>
            <rdf:Bag>
               <rdf:li>Places</rdf:li>
               <rdf:li>Germany</rdf:li>
               <rdf:li>Baden-Württemberg</rdf:li>
               <rdf:li>UmlautßüöäÜÖÄ</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Places</rdf:li>
               <rdf:li>Places|Germany</rdf:li>
               <rdf:li>Places|Germany|Baden-Württemberg</rdf:li>
               <rdf:li>Places|Germany|UmlautßüöäÜÖÄ</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
To be honest, I don't know what I'm looking for. The "raw" XMP/xml has nothing that says how it's encoded.

This can be viewed with samples/exiv2json:

630 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ bin/exiv2json  ~/Downloads/metadata-test.jpg 
{
    "Iptc": {
        "Envelope": {
            "CharacterSet": "G" 
        },
        "Application2": {
            "Keywords": "Places",
            "Keywords": "Germany",
            "Keywords": "Baden-Württemberg",
            "Keywords": "UmlautßüöäÜÖÄ" 
        }
    },
    "Xmp": {
        "dc": {
            "subject": "Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ" 
        },
        "lr": {
            "hierarchicalSubject": "Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ" 
        },
        "xmlns": {
            "dc": "http:\/\/purl.org\/dc\/elements\/1.1\/",
            "lr": "http:\/\/ns.adobe.com\/lightroom\/1.0\/" 
        }
    }
}

It's been a long day (12 hours of solid work on Exiv2). Tomorrow is another day. I beg you to stop talking about UNICODE metadata and other types of png/metadata. I'm working hard on build matters for Exiv2 v0.26.1.

RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai 3 months ago

Anything on an Exiv2 0.26 build that supports Unicode Paths on Windows?

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 3 months ago

Nothing has changed concerning exiv2 and unicode since this was discussed 7 months ago. The situation remains:

1) You can build Exiv2 to support wstring/unicode paths on Windows.
2) Exif metadata strings are encoded as binary characters. I believe wstrings can be stored in the metadata. If you do encode the data in this way, you'll almost certainly face interoperability issues with applications which expect UTF-8 encoding.

RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai 3 months ago

Thanks
I can build Exiv2 0.26 win32 libraries using the VS 2015 community edition with Unicode Path enable.
When I try to build the x64 libraries I get 11 errors.
all related to size_t conversion to uint32_t
4>..\..\src\tiffimage.cpp(196): error C2220: warning treated as error - no 'object' file generated
4>..\..\src\tiffimage.cpp(196): warning C4267: 'argument': conversion from 'size_t' to 'uint32_t', possible loss of data
4>..\..\src\tiffimage.cpp(222): warning C4267: '=': conversion from 'size_t' to 'long', possible loss of data

Any solution for this?

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 3 months ago

There are several ways to fix this:

1) Best Fix: Change the code by adding a cast (uint32_t) on 196:

   192            ByteOrder bo = TiffParser::decode(exifData_,
   193                                              iptcData_,
   194                                              xmpData_,
   195                                              io_->mmap(),
   196                                              (uint32_t) io_->size());

And cast (long) on 222:
   222                    size = (long) io_->size();

2) Quick Fix: Disable the setting in Visual Studio "Treat warnings as errors"

I think it's in the compiler settings. I try to avoid changing that setting. Warnings should be silenced by thoughtful action, not killing the messenger!

3) Work for me: I provide a patch.

This kind of thing is usually quite tedious for me because it produces different warnings/errors on different platforms.

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 3 months ago

I've checked the current version of the code on 'master'. Both casts I have recommend in Best Fix are in the current code on https://github.com/exiv2/exiv2/src/tiffimage.cpp. That is indeed the correct fix and would be the patch were I to provide one.

Do NOT copy any code from master into your v0.26 source code. The project has evolved in the 11 months since v0.26. src/tiffimage.cpp (and many other files) on master have changed and are not compatible with v0.26.

RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai 3 months ago

Thanks
I used 1) and fixed all the files (basicio.cpp, cr2image.cpp, crwimage.cpp, exif.cpp, jp2image.cpp, orfimage.cpp, pgfimage.cpp, pngimage.cpp, preview.cpp, rw2image.cpp, tiffimage.cpp)

Did try 2) before i wrote but for some reason the settings is ignored by the compiler (Visual Studio 2015 Community V 14.0.25431.01 Update 3)
Did try 2) with Visual Studio 2017 Community and there it worked (however I need it for VS2015)

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 3 months ago

Right. I'm pleased that this is working for you.

The EXV_UNICODE_PATH setting has always been a "poor relation" in Exiv2. By that I mean, it is given less respect than it deserves. I made a change to samples/exifprint.cpp for v0.26 to provide a minimal test harness which validates that it works (and not totally broken). I ought to update the buildserver to build and test this configuration. http://dev.exiv2.org/issues/1174 and http://dev.exiv2.org/issues/1169

There's a never ending stream of tasks to be undertaken when working on an open-source project. If you'd like to contribute, I'd be delighted to accept your help.

We're having an "Exiv2 Developer's Meeting" at my home in England on May 5. This, and other projects, will be discussed and we'll prioritise features for implementation within our resources. https://github.com/Exiv2/exiv2/issues/225

RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai 3 months ago

Tested with Hungarian and Serbian path names and works as it should.

RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills 3 months ago

Thanks for the update.

(1-17/17)

Redmine Appliance - Powered by TurnKey Linux