exiv2 fails to delete bulk of metadata in jpeg
I got this file which I made available here for you to test:
It's a 3 Megabytes jpeg with a lot of metadata from some obscure photoshop session.
If I delete the metadata with exiftool
exiftool -all="" overweightGeorgeMichael.jpg
it gives out a jpeg of 564K
But if I try to do it with exiv2:
exiv2 rm overweightGeorgeMichael.jpg
it gives out a jpeg of 2.8Megs
That a lot of undeleted metadata.
Updated by Robin Mills over 3 years ago
I know what's the matter, however I don't have a fix. The obvious work-around is to use exiftool.
Your explanation It's a 3 Megabytes jpeg with a lot of metadata from some obscure photoshop session seems to be correct. PhotoShop is using "Extended XMP" which isn't currently supported by Exiv2.
If we examine the structure of your file, I see:
592 rmills@rmillsmbp:~/Downloads $ curl -O http://www.deniscarl.com/overweightGeorgeMichael.jpg ; ls -alt over* % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2986k 100 2986k 0 0 1838k 0 0:00:01 0:00:01 --:--:-- 1842k -rw-r--r--+ 1 rmills staff 3058367 19 Dec 17:08 overweightGeorgeMichael.jpg 593 rmills@rmillsmbp:~/Downloads $ exiv2 -pS overweightGeorgeMichael.jpg STRUCTURE OF JPEG FILE: overweightGeorgeMichael.jpg address | marker | length | data 0 | 0xffd8 SOI 2 | 0xffe0 APP0 | 16 | JFIF.....H.H.... 20 | 0xffed APP13 | 8292 | Photoshop 3.0.8BIM.......'..Z... 8314 | 0xffe1 APP1 | 2657 | Exif..MM.*...............V...... 10973 | 0xffe2 APP2 | 3160 | ICC_PROFILE......HLino....mntrRG chunk 1/1 14135 | 0xffe1 APP1 | 65535 | http://ns.adobe.com/xap/1.0/.<?x 79672 | 0xffe1 APP1 | 65535 | CC7DFB7A8ED5E120CA9</rdf:li> <rd ... 2439004 | 0xffe1 APP1 | 42980 | 19F46DDB1CA0F98ED</rdf:li> <rdf: 2481986 | 0xffdb DQT | 67 2482055 | 0xffdb DQT | 67 2482124 | 0xffc0 SOF0 | 17 2482143 | 0xffc4 DHT | 31 2482176 | 0xffc4 DHT | 79 2482257 | 0xffc4 DHT | 20 2482279 | 0xffc4 DHT | 20 2482301 | 0xffda SOS 594 rmills@rmillsmbp:~/Downloads $A couple of years ago, I added command options -pX (print XMP) and -dX (delete XMP) which understand Extended XMP. Neither option is effective on your file. So, I'm wondering if there's something strange about how the XMP has been added by PhotoShop. I have stepped the code and the "XMP Extended Metadata" flag isn't set. I think PhotoShop is using a method to extend the XMP which isn't supported by Exiv2.
There are about 40 APP1 segments of 64k. That's about 2.6mb. That's the data that is not being deleted.
I think the fix is to see if building with the latest XMPsdk fixes this. Updating Exiv2 to build with XMPsdk (2016,2014 or 2013) is work-in-progress at the moment.
Updated by Robin Mills over 3 years ago
- Category changed from metadata to not-a-bug
- % Done changed from 10 to 30
- Estimated time changed from 2.00 h to 6.00 h
I've made another discovery about this. I thought "Adobe distribute DumpFile" as part of the XMPsdk. Let's try that". Interesting:
628 rmills@rmillsmbp:~/gnu/xmpsdk/XMP-Toolkit-SDK-CC201607 $ samples/target/macintosh/intel_64/Release/DumpFile ~/Downloads/over*.jpg Abort trap: 6 629 rmills@rmillsmbp:~/gnu/xmpsdk/XMP-Toolkit-SDK-CC201607 $
Here's the traceback:
0x104301000 - 0x10433cfff +DumpFile (0) ... /Users/USER/*/DumpFile 0x10435c000 - 0x104497fff +com.xmp.XMPCore (XMP Core 5.6.0 - 0) ... /Users/USER/*/XMPCore.framework/Versions/A/XMPCore 0x1045dc000 - 0x104711ff7 +com.xmp.XMPFiles (XMP Files 5.7.0 - 0) ... /Users/USER/*/XMPFiles.framework/Versions/A/XMPFiles ...Adobe is crashing in XMPCore when called by XMPFiles. Conclusion: Your file has illegal XMP.
It's not impossible to do something about this. About 3 years ago I added option -dI (delete everything that appears to be IPTC). There was an existing option -di to delete IPTC metadata. According to the spec, there should only be one IPTC segment. I added -dI to help a user with files containing multiple IPTC segments.
I'm willing to consider extending the scope of -dX to say delete everything that appears to be XMP. I'm rather overloaded at the moment and don't want to be side-tracked into this issue. I will consider it next year when time permits.