Exiv2 & Unicode Paths on Windows
Added by Arnold Wiegert about 4 years ago
When compiling exiv2lib for debugging, the output library is already 'decorated' with a trailing 'd'.
Once Unicode versions become available, it would be equally desirable to also identify those specific libraries with a trailing 'u'.
Replies (17)
RE: Exvi2 & Unicode - Added by Robin Mills about 4 years ago
I have code to submit for '', 'u', 'd', 'ud' and it seems to be working. Hope to submit it later this week.
Another approach would be to only support UNICODE on Windows because it provides a superset of the API. It takes nothing away, it adds wstring path functions to the api. http://dev.exiv2.org/boards/3/topics/2913?r=2921#message-2921
RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert about 4 years ago
'Windows only' would work for me; it is all I really work with and the proposed decorations are what I am used to.
After reading parts of the message you referenced, I would like to add that is is not just file names & paths that need to be handled as UNICODE.
The metadata withing the images can also contain UNICODE strings.
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills about 4 years ago
The aim here is to fix/guarantee that the UNICODE path code in Windows builds correctly with all Windows build systems (and Windows environment such as Cygwin and msys/2.0).
Exiv2 v0.26.1 is a 'dot' release and should be a 'drop in replacement' for v0.26. Changing the default libexiv2.dll to provide UNICODE functions is probably harmless, however we are changing the ABI should not do that for a 'dot' release. Let's get more experience/feedback with the UNICODE path code. We can consider making UNICODE the default build in Exiv2 v0.27.
The subject of handling UNICODE in the metadata is another matter. Being a native English speaker, I am rather challenged by localisation. I believe Exif tags (with string values) are binary which we can treat as UTF-8. I believe IPTC is similar. It's a byte count and some bytes. You've already mentioned that you have an UTF-8 issue with XMPsdk. I'm willing to investigate UTF-8 issues with Exif, IPTC and XMP metadata if you can provide test cases.
RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert about 4 years ago
Handling Unicode path is certainly essential to allow use of exiv2 under Windows, irrespective of the compiler etc.
How it is handled on the code and build end, is entirely up to you. I have no backwards or other compatibility constraints.
As for localization, until I got involved in this aspect of metadata, it was only a word in the dictionary and even now I am very much a newb myself.
But, since the image metadata standards all support Unicode strings in one way or other, in the end then exiv2 would need to be able to handle those as well, else path compatibility is nice, but not everything ;-)
I'll try to attach a couple of rather simplistic test files.
The first one, itext2.png is an unmodified copy of the PNG test file itxt2.png.
It contains some French text with one accented character.
The second one, itxt-german.png is a slightly modified version of one of the PNG test images.
IIRC, it was modified using pngcheck and contains a string in German with a couple of German characters, one 'Umlaut' & one sharp 's'
itxt2.png (5.35 KB) itxt2.png | |||
itxt-german.png (5.4 KB) itxt-german.png |
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills about 4 years ago
Arnold
We're off topic. I'm working flat-out on the Exiv2 v0.26.1 release - including your UNICODE/static/Debug/CMake code. And there are other matters on the TODO list that has been promised in v0.26.1 I don't have the bandwidth at present to start a totally different subjects such as UNICODE in metadata.
However, I've looked briefly at the files you sent.
599 rmills@rmillsmbp:~/Downloads $ exiv2 -pa ~/Downloads/itxt2.png 600 rmills@rmillsmbp:~/Downloads $ exiv2 -pR ~/Downloads/itxt2.png STRUCTURE OF PNG FILE: /Users/rmills/Downloads/itxt2.png address | chunk | length | data | checksum 8 | IHDR | 13 | ...[...E..... | 0x52edaae4 33 | gAMA | 4 | .... | 0x0bfc6105 49 | sBIT | 4 | .... | 0x4da52df6 65 | bKGD | 6 | ...... | 0x95cd2f20 83 | pCAL | 44 | bogus units...........foo/bar | 0x57407b1c 139 | pHYs | 9 | ......... | 0xd2dd7efc 160 | tIME | 7 | .....:. | 0x8eff267a 179 | tEXt | 9 | Title.PNG | 0xdc017935 200 | iTXt | 39 | Author...fr.Auteur.La plume de | 0x4fdb72e1 251 | zTXt | 26 | test.....N.)-Qx.P......... | 0xa869e99d 289 | IDAT | 4828 | x..\+........EFFb...X$.......D | 0x353e27e9 5129 | zTXt | 202 | Description....M.Mj.@...>.[... | 0xa9a15024 5343 | iTXt | 111 | Warning...de.WARNING........0. | 0xb3adddee 5466 | IEND | 0 | | 0xae426082 601 rmills@rmillsmbp:~/Downloads $ exiv2 -pa ~/Downloads/itxt2.png 602 rmills@rmillsmbp:~/Downloads $ which exiv2 /usr/local/bin/exiv2 603 rmills@rmillsmbp:~/Downloads $ exiv2 -pR ~/Downloads/itxt-german.png STRUCTURE OF PNG FILE: /Users/rmills/Downloads/itxt-german.png address | chunk | length | data | checksum 8 | IHDR | 13 | ...[...E..... | 0x52edaae4 33 | gAMA | 4 | .... | 0x0bfc6105 49 | sBIT | 4 | .... | 0x4da52df6 65 | pCAL | 44 | bogus units...........foo/bar | 0x57407b1c 121 | tIME | 7 | .....:. | 0x8eff267a 140 | bKGD | 6 | ...... | 0x95cd2f20 158 | pHYs | 9 | ......... | 0xd2dd7efc 179 | tEXt | 9 | Title.PNG | 0xdc017935 200 | iTXt | 39 | Author...fr.Auteur.La plume de | 0x4fdb72e1 251 | IDAT | 4000 | x..\+........EFFb...X$.......D | 0x199b9fd7 4263 | IDAT | 831 | .......#T..6.....`....G...(<.. | 0xa028a770 5106 | zTXt | 202 | Description..x.M.Mj.@...>.[... | 0x692a52f9 5320 | iTXt | 111 | Warning...de.WARNING.x......0. | 0xb5cd3e90 5443 | iTXt | 65 | Deutsch...de..Steinstra..e 10, | 0x23616b74 5520 | IEND | 0 | | 0xae426082 604 rmills@rmillsmbp:~/Downloads $As you can see, exiv2 does not list any metadata in the files. However the debugging code exiv2 -pR foo.png spots iTXt and zTXt blocks with Title, Author, Description. Exiv2 deals with Exif, IPTC, XMP and ICC metadata. This is a totally new species of metadata in PNG files. And that's another subject that could be investigated.
However, my focus is Exiv2 v0.26.1. These new subjects:
- UNICODE in metadata
- Metadata iXTt/zTXt blocks in PNG files
have never been supported by Exiv2. I'd like to retain focus and stack those other subjects for a future project. I recommend that you open new Feature requests and they will be investigated at a future time.
RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert about 4 years ago
Understood.
Because I am not sure how to exactly describe the issues, I have opened 2 feature request and hope that you or someone else on the team can flesh out the nitty gritty details.
RE: Exiv2 & Unicode Paths on Windows - Added by Arnold Wiegert about 4 years ago
PS: any support for Unicode strings by exiv2 will be useful.
I am by no means tied to iTxt or zTxt chunks - they were just handy examples files
Support via XMP or any other 'standard' would be quite workable for me.
As an example, I have added some UTF-8-Unicode strings to a sample jpg image using XnViewMP
XnViewMP reports the data as being placed in IPTC-IIM & XMP
IPTC-IIM Keywords : Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ
XMP dc subject[1] Places, subject[2] Germany subject[3] Baden-Württemberg subject[4] UmlautßüöäÜÖÄ lr hierarchical subject[1] Places hierarchical subject[2] Places|Germany hierarchical subject[3] Places|Germany|Baden-Württemberg hierarchical subject[4] Places|Germany|UmlautßüöäÜÖÄ
Exiftool reports the data as:
Subject : Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ
Hierarchical Subject : Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ
metadata-test.jpg (70.6 KB) metadata-test.jpg |
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills about 4 years ago
Well, here's what I see on the Mac:
623 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ exiv2 -px ~/Downloads/metadata-test.jpg Xmp.dc.subject XmpBag 4 Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ Xmp.lr.hierarchicalSubject XmpBag 4 Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ 624 rmills@rmillsmbp:~/gnu/exiv2/0.26 $You can get the "raw" XMP/xml from the file with:
625 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ exiv2 -pX ~/Downloads/metadata-test.jpg <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.5.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:lr="http://ns.adobe.com/lightroom/1.0/"> <dc:subject> <rdf:Bag> <rdf:li>Places</rdf:li> <rdf:li>Germany</rdf:li> <rdf:li>Baden-Württemberg</rdf:li> <rdf:li>UmlautßüöäÜÖÄ</rdf:li> </rdf:Bag> </dc:subject> <lr:hierarchicalSubject> <rdf:Bag> <rdf:li>Places</rdf:li> <rdf:li>Places|Germany</rdf:li> <rdf:li>Places|Germany|Baden-Württemberg</rdf:li> <rdf:li>Places|Germany|UmlautßüöäÜÖÄ</rdf:li> </rdf:Bag> </lr:hierarchicalSubject> </rdf:Description> </rdf:RDF> </x:xmpmeta>To be honest, I don't know what I'm looking for. The "raw" XMP/xml has nothing that says how it's encoded.
This can be viewed with samples/exiv2json:
630 rmills@rmillsmbp:~/gnu/exiv2/0.26 $ bin/exiv2json ~/Downloads/metadata-test.jpg { "Iptc": { "Envelope": { "CharacterSet": "G" }, "Application2": { "Keywords": "Places", "Keywords": "Germany", "Keywords": "Baden-Württemberg", "Keywords": "UmlautßüöäÜÖÄ" } }, "Xmp": { "dc": { "subject": "Places, Germany, Baden-Württemberg, UmlautßüöäÜÖÄ" }, "lr": { "hierarchicalSubject": "Places, Places|Germany, Places|Germany|Baden-Württemberg, Places|Germany|UmlautßüöäÜÖÄ" }, "xmlns": { "dc": "http:\/\/purl.org\/dc\/elements\/1.1\/", "lr": "http:\/\/ns.adobe.com\/lightroom\/1.0\/" } } }
It's been a long day (12 hours of solid work on Exiv2). Tomorrow is another day. I beg you to stop talking about UNICODE metadata and other types of png/metadata. I'm working hard on build matters for Exiv2 v0.26.1.
RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai over 3 years ago
Anything on an Exiv2 0.26 build that supports Unicode Paths on Windows?
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills over 3 years ago
Nothing has changed concerning exiv2 and unicode since this was discussed 7 months ago. The situation remains:
1) You can build Exiv2 to support wstring/unicode paths on Windows.
2) Exif metadata strings are encoded as binary characters. I believe wstrings can be stored in the metadata. If you do encode the data in this way, you'll almost certainly face interoperability issues with applications which expect UTF-8 encoding.
RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai over 3 years ago
Thanks
I can build Exiv2 0.26 win32 libraries using the VS 2015 community edition with Unicode Path enable.
When I try to build the x64 libraries I get 11 errors.
all related to size_t conversion to uint32_t
4>..\..\src\tiffimage.cpp(196): error C2220: warning treated as error - no 'object' file generated
4>..\..\src\tiffimage.cpp(196): warning C4267: 'argument': conversion from 'size_t' to 'uint32_t', possible loss of data
4>..\..\src\tiffimage.cpp(222): warning C4267: '=': conversion from 'size_t' to 'long', possible loss of data
Any solution for this?
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills over 3 years ago
There are several ways to fix this:
1) Best Fix: Change the code by adding a cast (uint32_t) on 196:
192 ByteOrder bo = TiffParser::decode(exifData_, 193 iptcData_, 194 xmpData_, 195 io_->mmap(), 196 (uint32_t) io_->size());
And cast (long) on 222:
222 size = (long) io_->size();
2) Quick Fix: Disable the setting in Visual Studio "Treat warnings as errors"
I think it's in the compiler settings. I try to avoid changing that setting. Warnings should be silenced by thoughtful action, not killing the messenger!
3) Work for me: I provide a patch.
This kind of thing is usually quite tedious for me because it produces different warnings/errors on different platforms.
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills over 3 years ago
I've checked the current version of the code on 'master'. Both casts I have recommend in Best Fix are in the current code on https://github.com/exiv2/exiv2/src/tiffimage.cpp. That is indeed the correct fix and would be the patch were I to provide one.
Do NOT copy any code from master into your v0.26 source code. The project has evolved in the 11 months since v0.26. src/tiffimage.cpp (and many other files) on master have changed and are not compatible with v0.26.
RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai over 3 years ago
Thanks
I used 1) and fixed all the files (basicio.cpp, cr2image.cpp, crwimage.cpp, exif.cpp, jp2image.cpp, orfimage.cpp, pgfimage.cpp, pngimage.cpp, preview.cpp, rw2image.cpp, tiffimage.cpp)
Did try 2) before i wrote but for some reason the settings is ignored by the compiler (Visual Studio 2015 Community V 14.0.25431.01 Update 3)
Did try 2) with Visual Studio 2017 Community and there it worked (however I need it for VS2015)
RE: Exiv2 & Unicode Paths on Windows - Added by Robin Mills over 3 years ago
Right. I'm pleased that this is working for you.
The EXV_UNICODE_PATH setting has always been a "poor relation" in Exiv2. By that I mean, it is given less respect than it deserves. I made a change to samples/exifprint.cpp for v0.26 to provide a minimal test harness which validates that it works (and not totally broken). I ought to update the buildserver to build and test this configuration. http://dev.exiv2.org/issues/1174 and http://dev.exiv2.org/issues/1169
There's a never ending stream of tasks to be undertaken when working on an open-source project. If you'd like to contribute, I'd be delighted to accept your help.
We're having an "Exiv2 Developer's Meeting" at my home in England on May 5. This, and other projects, will be discussed and we'll prioritise features for implementation within our resources. https://github.com/Exiv2/exiv2/issues/225
RE: Exiv2 & Unicode Paths on Windows - Added by Zoltan Hubai over 3 years ago
Tested with Hungarian and Serbian path names and works as it should.