Project

General

Profile

Extracting UTF8 metadata

Added by gary cohen about 12 years ago

I'm having trouble extracting the Exif.Photo.UserComment from an image that has it in Unicode and I'm wondering what I'm doing wrong. I've successfully called readMetadata() and get a valid iterator via exifData.findKey( Exiv2::ExifKey("Exif.Photo.UserComment") ); But, when I go to get the value I'm not getting what I expect...

const std::string value = i->getValue()->toString() => "charset=\"Unicode\" "

I thought maybe since there's a null character I can't use the std::string version, but then how do I get the comment? I've tried using:

uint8_t value[5000];
i->getValue()->copy( value, Exiv2::littleEndian ); // i see the littleEndian flag isn't used but something has to go there, right?

but that seems to give me the raw data back.

In short, I need to get the UserComment back as UTF8 and store it in a std::string for output. Any help would be much appreciated. Attached is a test image.

exif_comment.jpg (60.2 KB) exif_comment.jpg unicode encoded user comment

Replies (5)

RE: Extracting UTF8 metadata - Added by Andreas Huggel about 12 years ago

Gary,

You're doing just fine. Both of these methods should work. The actual comment is after the "charset=\"Unicode\" " string in the first method and after the first 8 bytes in the second. And in both cases you'll access the same 'raw' data as Exiv2 doesn't convert the comment in any way.

Besides, having null characters in an std::string is no issue (just avoid c_str() and use data() instead if you need a char pointer), so that's ok too.

Of course, instead of using getValue(), in the first method, you could just say i->toString() which would be a bit more efficient. In the second method, i->value().copy(...) would similarly avoid the overhead of getValue().

And in addition to these two methods there are several others:

  • i->print(&exifData) returns an std::string without the leading "charset=\"Unicode\" " and without trailing 0 bytes.
  • Exiv2::CommentValue::comment() returns an std::string without the leading "charset=\"Unicode\" ". You'll need a dynamic_cast to access that method though.

Andreas

RE: Extracting UTF8 metadata - Added by gary cohen about 12 years ago

Hi Andreas,

I liked your idea to use Exiv2::CommentValue::comment and when I do, I get an empty string.

(gdb) p csid
$1 = Exiv2::CommentValue::unicode
(gdb) n
(gdb) p value
$2 = {static npos = 4294967295, M_dataplus = {<std::allocator<char>> = {<_gnu_
cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x84a8
c2c ""}}

But, I know the UserComment has data there. Any ideas?

RE: Extracting UTF8 metadata - Added by gary cohen about 12 years ago

Oh, and the code:

... part of a loop ... {
if ( attr == "Exif.Photo.UserComment" ) {
const Exiv2::CommentValue &commentValue = dynamic_cast< const Exiv2::CommentValue & >( iter->value() );
Exiv2::CommentValue::CharsetId csid = commentValue.charsetId();
std::string value = commentValue.comment();
}
}

RE: Extracting UTF8 metadata - Added by Andreas Huggel about 12 years ago

Hi Gary,

The code you posted looks ok, I can't see what's wrong. And it works fine here, I can access the comment in the sample image you provided using the various methods discussed and convert it to UTF-8. A complete simple test program is attached.

Andreas

try2.cpp (3.01 KB) try2.cpp Access user comment in various ways and convert it to UTF-8

RE: Extracting UTF8 metadata - Added by gary cohen about 12 years ago

thank you. you helped me find my problem. i was doing something wrong with the utf8 conversion.

    (1-5/5)