Bug #848

commets and copyright is output as ascii, but it always write in UTF-8 format

Added by Shawn Jean about 9 years ago. Updated over 8 years ago.

Start date:
20 Sep 2012
Due date:
% Done:


Estimated time:


although it works fine with English, But it's truely a problem with other languages.



Updated by Robin Mills about 9 years ago


I'm Scottish and a native English speaker. Working with UTF-8 and other character sets is a mystery to me. I tried this on Ubuntu 12.04 (bash 4.2.24):

$ echo $'>\xE2\x98\xA0<'
$ exiv2 -M$'set Exif.Photo.UserComment >\xE2\x98\xA0<' ~/R.jpg
$ exiv2 -pa ~/R.jpg | grep Comment
Exif.Photo.UserComment                       Undefined  13  >☠<

Doing the same and filtering the output with od -h (to avoid browser UTF issues)

$ echo -n $'>\xE2\x98\xA0<'  | od -h
0000000 e23e a098 003c
$ exiv2 -M$'set Exif.Photo.UserComment >\xE2\x98\xA0<' ~/R.jpg ; exiv2 -pa -g Comm ~/R.jpg | od -h
0000000 7845 6669 502e 6f68 6f74 552e 6573 4372
0000020 6d6f 656d 746e 2020 2020 2020 2020 2020
0000040 2020 2020 2020 2020 2020 2020 4120 6373
0000060 6969 2020 2020 2020 3620 2020 e23e a098
0000100 0a3c

I'm using the version of exiv2 in the trunk in which the option -g specifies any substring of the name of a tag.

It certainly looks as though the UserComment is simply binary and will store almost anything you give him.

Looking in the man page, I see the following example:

       exiv2 -M"set Exif.Photo.UserComment charset=Ascii New Exif comment" image.jpg
              Sets the Exif comment to an ASCII string.

It appears the you can use alternative character sets if you wish, although I can't explain how to use this feature. I know you're a very good engineer and perhaps you can read the code and tell us all!



Updated by Andreas Huggel about 9 years ago

I don't understand the problem description. Shawn, can you please elaborate, preferably with a small program / sample use of the exiv2 command line tool what you're doing in detail and what is going wrong?


Updated by Shawn Jean about 9 years ago

I should have gave more spec, a bit busy these days, sorry for that.

It should not be called a bug indeed. Same as Robin's test, i did it serial times. Write and Read with exiv2 turns out to be totally right.

X.Jing@XJing-PC ~/exiv2/msvc64/bin/Win32/Debug
$ exiv2 -M 'set Exif.Photo.UserComment This is 中文测试' test.jpg && exiv2 -pa test.jpg | grep Comment
Exif.Photo.UserComment                       Undefined  24  This is 中文测试

$ echo 'this is 中文测试' | od -h
0000000 6874 7369 6920 2073 d0d6 c4ce e2b2 d4ca
0000020 000a

It's all right.

But When i use other softwares like Windows Explorer to write the string in, it turn out to be

exiv2 -pa test.jpg | grep Comment
Exif.Image.XPComment                         Byte       26  this is 涓枃娴嬭瘯

It because the Windows Explorer write the string in Unicode.

74 00 68 00 69 00 73 00 20 00 2D 4E 87 65 4B 6D D5 8B

;-) guys, weired right? I don't know whether i made it clear. glad for any question.


Updated by Robin Mills about 9 years ago

  • Category set to metadata
  • Status changed from New to Resolved
  • Assignee set to Robin Mills


I think this is something to do with Windows Explorer converting the user entered string to UCS-16 (or something). You may be able to use iconv to print the string correctly:

$ exiv2 -pa test.jpg | grep Comment | iconv -f UCS16 -t UTF8

However, my old/Scottish eyes (and brain) have never understood bamboo characters!

I'm going to update this issue to "Resolved". You may reopen and/or assign it to me if you have additional information.



Updated by Robin Mills over 8 years ago

  • Status changed from Resolved to Closed

Fixed in 0.24.

Also available in: Atom PDF