Project

General

Profile

Charset conversion

Added by Vladimir Nadvornik almost 12 years ago

Hi Andreas,

Are there any plans how to fix the charset problems? Current situation is not ideal:

- XMP is always in utf-8
- IPTC uses various charsets (this is related: http://www.cpanforum.com/threads/2114 )
- Interpreted EXIF values have locale charset

IMHO Exiv2 should try to convert everything to utf-8 internally.

Vladimir


Replies (7)

RE: Charset conversion - Added by Andreas Huggel almost 12 years ago

Hi Vladimir,

Thanks for the link.

IMHO Exiv2 should try to convert everything to utf-8 internally.

Yes, that sounds like a sensible thing to do, and I agree that what we have is not ideal.
Here is an old summary of the status http://dev.exiv2.org/boards/3/topics/show/62#message-66
It also says the Metadata working group's document has some advise on the topic.

Feel free to look into this if you have the time... this is not on my radar and I won't have the time for now.

Andreas

RE: Charset conversion - Added by Vladimir Nadvornik almost 12 years ago

This patch adds the basic IPTC charset support:
- charset autodetection
- fix for conversion from IPTC to XMP

The autodetection can't be reliable, it is up to the application to provide correct fallback charset (for example ask the user).
The patch should not break applications that already do some conversion themselves.

I have also changed the type of "Iptc.Envelope.CharacterSet" from undefined to string, IMHO it is more correct.

RE: Charset conversion - Added by Andreas Huggel almost 12 years ago

Thanks Vladimir, will look into this as soon as I'm done with the Windows unicode path patch.

Andreas

RE: Charset conversion - Added by Andreas Huggel over 11 years ago

I have also changed the type of "Iptc.Envelope.CharacterSet" from undefined to string, IMHO it is more correct.

Vladimir,

That part still causes the test/bugfixes-test.sh to fail. I checked the IPTC specs but couldn't determine what the correct type should be. Now I wonder if I committed this a bit hastily and am considering backing this change out, since the tag tables are part of the API to some extent, and such changes may break applications. Did you find a reference that says this dataset is a "string" and "undefined" is wrong?

Andreas

RE: Charset conversion - Added by Andreas Huggel over 11 years ago

Also, test/conversions.sh fails due to what seem to be different issues exposed by these test cases.
Would appreciate advise on what to fix and how.

RE: Charset conversion - Added by Vladimir Nadvornik over 11 years ago

The specification says that it consist of one or more control functions, a control function consists of the escape control character and one or more graphic characters. IMHO that means a string.

    (1-7/7)