Bug #1175
Exiv2 corrupts files larger than 2GB with Exif IFD at the end of the file
100%
Description
Consider the following snippet which just reads meta data from file, modifies XMP meta data and writes everything back
Exiv2::Image::AutoPtr image(Exiv2::ImageFactory::open(filePath)); image->readMetadata(); Exiv2::ExifData exiv2ExivData = image->exifData(); Exiv2::XmpData xmpData = image->xmpData(); Exiv2::IptcData exiv2IptcData = image->iptcData(); image->writeMetadata(); image->setExifData(exiv2ExivData); image->setIptcData(exiv2IptcData); for(std::map<std::string, std::string>::const_iterator iter(tags.constBegin()); iter != tags.constEnd(); ++iter) { xmpData[iter.key()] = iter.value(); } image->setXmpData(xmpData); image->writeMetadata();
If this is run on a file that is larger than 2GB and has its Exif IFD at the end of the file, the saved file is corrupt and is not loadable in image viewers.
Associated revisions
#1175. Thanks to LaserSoft for reporting this and providing a patch.
#1175 I'm going to accept the recommendation to change BasicIo::size() to return size_t. This passes the test suite. The only "ripple" outside of basicio is to iotest.cpp. This change enables several casts to be removed.
#1175 more cast ripples on GCC5/Linux
History
Updated by Robin Mills over 5 years ago
- Target version set to 1.0
I'm not surprised to hear about this. Although we support a 64bit build, I have my doubts about our ability to handle HUGE files (which require 64 bit offsets). Regrettably, nothing can be done about this at the moment as I have lots to finish for v0.26.
Updated by LaserSoft Imaging over 5 years ago
The fix is proposed in the following forum topic: http://dev.exiv2.org/boards/3/topics/2528
Updated by Robin Mills over 5 years ago
- Status changed from New to Resolved
- Assignee set to Robin Mills
- Target version changed from 1.0 to 0.26
- % Done changed from 0 to 80
- Estimated time set to 1.00 h
Thanks, George. Thanks for reporting this and providing the patch. Submitted. r4269.
I didn't notice the Forum topic with the patch. Can I ask you to build the trunk and verify that this has been fixed.
I have opened an Issue for HUGE file support #1176. In this case, we've dodged the 31/32 bit bullet. We won't be so lucky when we work on files > 4GB. I'm sure that will take a considerable effort to acquire all the test files, then test and fix all the code. It'll be tedious, however it will be increasingly important, especially for video support.
Updated by Robin Mills over 5 years ago
- Status changed from Resolved to Closed
- % Done changed from 80 to 100
Updated by LaserSoft Imaging over 4 years ago
Our software is still suffering from this issue. Now we might have a (partial) solution for issue #1175.
By debugging readMetadata we found out that the size()-method return "-1" for large files (>2GB). Applied to mappedLength this value has been transformed into something very large since mappedLength is of unsigned type. So we implemented two changes in the Exiv2-sources:
1. Within Methode FileIo::Impl::stat(StructStat& buf) we make use of 64 bit WinAPI methods like this:
#ifdef EXV_UNICODE_PATH if (wpMode_ == wpUnicode) { #ifdef _WIN64 struct _stat64 st; ret = ::_wstati64(wpath_.c_str(), &st); if (0 == ret) { buf.st_size = st.st_size; buf.st_mode = st.st_mode; buf.st_nlink = st.st_nlink; } #else struct _stat st; ret = ::_wstat(wpath_.c_str(), &st); if (0 == ret) { buf.st_size = st.st_size; buf.st_mode = st.st_mode; buf.st_nlink = st.st_nlink; } #endif
It turned out, that for file sizes >2GB _stat and _wstat have are not capable of returning sufficient file information. Thus on 64 bit compilations we switched to *64-methods.
2. For all the size() methods we changed the return value from from long to size_t since this type is capable of storing correct sizes.
With these two changes applied the Exiv2-library is able to store IPTC data in files larger than 2GB. We will deploy our software with this fix. Please consider checking the solution and also include it into your next version.
Updated by Robin Mills over 4 years ago
- Estimated time changed from 1.00 h to 2.00 h
Thank you for the patch which I have submitted as r4755. It will be included in Exiv2 v0.26 which I hope will be released in the next few days.
The top-ranked item for v0.27 in http://dev.exiv2.org/projectns/exiv2/news is #992 - Better raw file support and test. Although the exiv2 code compiles and executes in 64 bit, the handling of HUGE files (>2GB) hasn't had the test and stressing it deserves. I'm sure there are numerous place where 64 bit values have been converted to 32 and back to 64 bit. This will go un-noticed for many files, however this must be fixed for HUGE files.
I'm not sure if/when this matter will get attention. Open source relies on volunteers. Regrettably, nobody has volunteered to work in this area.
Updated by LaserSoft Imaging over 4 years ago
Robin Mills wrote:
Thank you for the patch which I have submitted as r4755. It will be included in Exiv2 v0.26 which I hope will be released in the next few days.
Thanks for the quick reaction. Just reviewed the mentioned revision r4755: Please also consider the scond part of the suggested fix (changing size() return values to size_t). Since long is signed, file sizes might exceed it and change to negative values.
Updated by Robin Mills over 4 years ago
Christoph
Thanks for drawing my attention to your second point. I'm happy to change size() to return size_t provided it doesn't cause a ripple of change in the code. This is another example of the work that is required to certify our handling of HUGE files.
Robin
Updated by Robin Mills over 4 years ago
- Estimated time changed from 2.00 h to 3.00 h
r4756 changes BasicIo::size() to return size_t. The test suite runs successfully without change.
This change contains the "code ripple" to basicio.{cpp|hpp} and iotest.cpp. It enables several casts to be removed from the code. It adds one cast in a comparison of size() and tell(). Because tell() returns a long which can be -1, this new/additional cast is unavoidable.
I am rather nervous about this change as it could be the final change for Exiv2 v0.26. The evidence of lack of ripple and removal of casts gives me the courage to commit this.
Christoph:
Please run your own test suite with r4755 and r4756 compiled into your product. If anything arises, I will revert r4755 and r4756 and reconsider/defer this matter for v0.27.
Updated by LaserSoft Imaging over 4 years ago
I'm afraid this issue is not yet solved... :(
I have to say that we first implemented the fix on an older state of exiv2 sources. In order to also test the other changes from your revision we updated the sources via subversion and applied the fix for this ticket.
Now we run into another error when handling large files when attempting to open these with Exiv2::ImageFactory::open:
Exiv2 Error 11: The file contains data of an unknown image type
We also tested the new sources without the fix and still this error appears. So at least the error has not been introduced by revision r4755 or r4756
Updated by Robin Mills over 4 years ago
I'm not sure what to say about this. I think you are saying that HUGE file support in Exiv2 is broken. There may be times when you will be lucky and be successful when operating on a HUGE file. However, there are really a collection of different issues involved.
1) basicio.cpp and http.cpp require thorough inspection and remedial work to be certain they work on HUGE files
2) the exiv2 code that reads files (such as tiffimage.cpp) should be thoroughly tested on HUGE files
3) we have an outstanding request to support BigTIFF (the 64 bit variant of Tiff) #1186
I'm wondering how/if I will have time to work on this. I am a 66 year old volunteer. Here's the "TODO" list for v0.27 http://dev.exiv2.org/news/3 and is perhaps 2 or 3 years of work.
I attended a conference in Rio on Saturday to appeal to the Linux Community for help. https://www.youtube.com/watch?v=3Fv57Lbhmqg
If you'd like to be involved with fixing our HUGE file support, I will mentor and help you.
#1175. Thank You, LaserSoft, for reporting this and providing the patch.