Exiv2 & wmNonIntrusive Flag

Added by David Vongrad 19 days ago

In my MFC C++ application (VS 2017) using Exiv2 0.26, I would like to change some time stamps like Exif.Photo.DateTimeOriginal to a new value. However, I am noticing that on 24 MB files from a Nikon D810, this can be quite slow since it appears that the entire EXIF data is being rewritten to the file making it 10 K smaller in one JPG file I'm using. I have been made aware of a "wmNonIntrusive" flag which I of the understanding means not to rewrite EXIF unless necessary but I'm not sure how to use it properly. Since I am replacing 19 bytes in a file with a call to writeMetadata which the API docs for it say that "If no values have been assigned to a given metadata type, any exists section for that metadata type will be removed from the image", I am unsure as to why Exiv2 thinks rewriting all the EXIF data is required.

Are there some API calls that will allow me to change EXIF data that is non-intrusive to improve the performance of Exiv2 or is my best bet to find the date/time strings within the file and change them with standard file I/O? Thanks.


Replies (7)

RE: Exiv2 & wmNonIntrusive Flag - Added by Robin Mills 18 days ago

David

Somebody raised an issue last week concerning wmIntrusive. I will investigate this in the next few days. http://dev.exiv2.org/issues/1324

I've just returned this morning from attending a wedding of a member of Team Exiv2 in Vietnam. So I'll be jet lagged for a few days and hope to work on this next week.

I'll see if something can be done to "pump" the performance of Exiv2 for your use case. However, my hunch is that the Exiv2 data convertors cause your file to increase in size with the consequential I/O.

A totally different approach is to determine the offset of DateTimeOriginal metadata in your file, memory map the file, and modify the bytes. This will be "almost instantaneous" as the file will not be rewritten. This approach requires your files to be parsed. If you are only using JPEGs, it might be quite easy to write code to do this for a single format. However, if you are using several formats, this approach doesn't scale. I can't remember if you've provided me with samples of your files. Can you do that please? Which formats are you using: DNG, Tiff, JPEG?

I'm sorry this issue is giving you angst. You'll appreciate that I am an unpaid volunteer. The engineering effort to optimise this use case could be considerable. How often do you modify the time stamps and how much time will this save?

RE: Exiv2 & wmNonIntrusive Flag - Added by Robin Mills 18 days ago

Here's something that looks rather promising: https://docs.python.org/2/library/mmap.html

A memory mapping module for python and it includes a method to search the memory mapped file. So, you can hunt for the metadata and determine the location in memory of the date strings. I'm doing the search in bash (however it's just as easy in python):

1 Find the metadata using exiv2

533 rmills@rmillsmbp:~/clanmills $ exiv2 -pa --grep DateTime ~/Stonehenge.jpg 
Exif.Image.DateTime                          Ascii      20  2015:07:16 20:25:28
Exif.Photo.DateTimeOriginal                  Ascii      20  2015:07:16 15:38:54
Exif.Photo.DateTimeDigitized                 Ascii      20  2015:07:16 15:38:54
534 rmills@rmillsmbp:~/clanmills $ 
2 Find the offset in the file of the DateTime strings
534 rmills@rmillsmbp:~/clanmills $ strings -a -t d ~/Stonehenge.jpg | grep '2015:07:16 20:25:28' 
    214 2015:07:16 20:25:28
535 rmills@rmillsmbp:~/clanmills $ strings -a -t d ~/Stonehenge.jpg | grep '2015:07:16 15:38:54' 
    760 2015:07:16 15:38:54
    780 2015:07:16 15:38:54
536 rmills@rmillsmbp:~/clanmills $ 
Now we know that DateTime is at offset 214. DateTimeOriginal and DateTimeDigitized are at 760 and 780.

When you know the offset, the document says that you can write the new substring data with a python "slice":

mm[780:799] = '2015:07:16 10:11:12'

RE: Exiv2 & wmNonIntrusive Flag - Added by David Vongrad 18 days ago

Hi Robin,

Thanks for your reply. I had wanted to title this thread “Exiv2 and Image->writeMetadata”, but I didn’t see any way to correct my blunder after it was submitted. Your title is just as good so thanks for changing it.

The goal is to set EXIF data like Exif.Photo.DateTimeOriginal and other date fields along with matching file system times only once (perhaps twice if I forget about DST changes). So far, I have been working with a small sample of JPG files from my Nikon D810 (~24 MB) and JPG from my daughter’s Sony a6000 from her spring trip to Austria (~7.3 MB) where a DST change for Austria did occur while she was there. Simply reading date stamps from around 380 Nikon JPG or 840 Sony JPG and renaming the files based on those dates is quite acceptable at only a few seconds, but modifying the date fields for all those files is quite time consuming at over about 17 minutes for Nikon and 6 minutes for Sony.

The JPG images are decreasing in size as I suspect writeMetadata is removing EXIF “empty” data as was my understanding in the documentation. It’s doubtful I can use offsets to find the date fields in any files as I suspect these offsets would change for camera manufacturer or even file type (I’m using JPG from a few cameras, Nikon’s NEF, Sony’s ARW, and Canon’s CR2 so far). For example, the first date field in a Sony a6000 JPG is at 0xFA0A and at 0xF404 in a Nikon D810 JPG. I know that if I insert “Exif.Photo.UserComment” into the image, it is written between one date and the next and obviously dates farther into the file are dependent on how long the comment is. I haven’t yet tried to see if I can use comment length plus a known number of bytes to find the next date field, but I have my doubts this would be consistent from one camera to the next.

I’m not adverse to opening a file in write binary mode, finding an ASCII string to change a few bytes and closing the file. As you said this would be quite fast, but I wanted to explore the possibility of using existing features of Exiv2 before doing it “by hand”. I would prefer to put my trust into Exiv2 rather than a brute force method, but it’s something I think I’ll try just for “fun”.

I don’t want you to spend to much time on this, especially now if you are recovering from jet lag. I certainly know what being an unpaid volunteer is all about, but I will certainly appreciate any efforts you or your team can offer. Whether or not the engineering efforts are worth it to handle the request of me and a few others is something only the Exiv2 team can make. My app is a personal endeavour for myself and a few friends and extended family members so it’s not like it needs to be done today. I could send you a 24 MB Nikon JPG, but there is a 20 MB limit in this forum and time delays in rewriting files might be considered bearable unless you were processing hundreds of files at once.

In the meantime, I’ll look at the links you provided and keep you apprised if I trust my brute force method to be a workable solution if I don’t hear from you by the time I’ve coded it and tested it enough to be comfortable with it.

Thanks again.

RE: Exiv2 & wmNonIntrusive Flag - Added by Robin Mills 18 days ago

Well, let's see what turns up. I've promised to investigate #1324 next week. So, I'll look at your use case when I'm working in that part of the code. I'd prefer not to involve other members of the team in this investigation as they are focused in code for v0.27. We hope to reach v0.27 RC1 in December.

RE: Exiv2 & wmNonIntrusive Flag - Added by David Vongrad 17 days ago

With 3 file seeks to consistent hard-coded offsets in the 840 Sony JPGs, I got the updates down to about 30 seconds from 6 minutes. I'm thinking searches for the date strings might increase that time to about a minute at most. Of course, I'm still quite open to any Exiv2 offerings. Good luck with #1324 and the release of 0.27!

RE: Exiv2 & wmNonIntrusive Flag - Added by David Vongrad 8 days ago

Well, I searched for the 3 date tag values and saved them to a std::map that is keyed on the values it finds. Each mapped value contains a vector where I can then search for the keys in the image and save their offsets from the beginning of the file. The vector is used in case the date key is the same for more than one of the date tags. Using those offsets, I can then seek to those offsets and write 19 new bytes of whatever date I want. The process for 840 Sony JPGs is only a couple seconds longer than the hard-coded test I mentioned above, thus cutting the Exiv2 EXIF rewrite time from about six minutes to just over 30 seconds. I put an option in my app to write the dates by Exiv2 methods that will make the files smaller if the user wants that at the expense of waiting longer to process them or by offsets that keeps the file sizes the same. I tested my approach on files from several cameras and all seems to be good. It may sound like a sloppy hack, but it works for what I need it for! :)

RE: Exiv2 & wmNonIntrusive Flag - Added by Robin Mills 8 days ago

This is good news. You’re doing the right thing here. Sometimes general purpose code simply does not know the short cut to dealing with something. You can probably make this even faster when you know that in JPG files, the maximum delta offset of two items of EXIF Metadata is 64k. So once you find one date string, you have a upper bound for how far to search.

I’m glad you’ve found this fix because I’ve been sick this week and haven’t been able to work on this nor #1324.

Very happy that you’ve “dug in” and found a solution.

(1-7/7)

Redmine Appliance - Powered by TurnKey Linux