Bug #1175

Exiv2 corrupts files larger than 2GB with Exif IFD at the end of the file

Added by LaserSoft Imaging over 1 year ago. Updated 3 months ago.

Status:ClosedStart date:31 Mar 2016
Priority:NormalDue date:
Assignee:Robin Mills% Done:

100%

Category:tiff parserEstimated time:3.00 hours
Target version:0.26

Description

Consider the following snippet which just reads meta data from file, modifies XMP meta data and writes everything back

Exiv2::Image::AutoPtr image(Exiv2::ImageFactory::open(filePath));

image->readMetadata();
Exiv2::ExifData exiv2ExivData = image->exifData();
Exiv2::XmpData xmpData = image->xmpData();
Exiv2::IptcData exiv2IptcData = image->iptcData();

image->writeMetadata();
image->setExifData(exiv2ExivData);
image->setIptcData(exiv2IptcData);
for(std::map<std::string, std::string>::const_iterator iter(tags.constBegin()); iter != tags.constEnd(); ++iter)
{
       xmpData[iter.key()] = iter.value();
}
image->setXmpData(xmpData);

image->writeMetadata();

If this is run on a file that is larger than 2GB and has its Exif IFD at the end of the file, the saved file is corrupt and is not loadable in image viewers.

Associated revisions

Revision 4269
Added by Robin Mills over 1 year ago

#1175. Thank You, LaserSoft, for reporting this and providing the patch.

Revision 4755
Added by Robin Mills 3 months ago

#1175. Thanks to LaserSoft for reporting this and providing a patch.

Revision 4756
Added by Robin Mills 3 months ago

#1175 I'm going to accept the recommendation to change BasicIo::size() to return size_t. This passes the test suite. The only "ripple" outside of basicio is to iotest.cpp. This change enables several casts to be removed.

Revision 4759
Added by Robin Mills 3 months ago

#1175 Correction to r4756 Another three casts required to build with Visual Studio (size_t code ripple)

Revision 4760
Added by Robin Mills 3 months ago

#1175 Correction to r4597 Additional file (which, in error, I didn't to commit)

Revision 4765
Added by Robin Mills 3 months ago

#1175 more cast ripples on GCC5/Linux

History

#1 Updated by LaserSoft Imaging over 1 year ago

Similar issue seems to be #995

#2 Updated by Robin Mills over 1 year ago

  • Target version set to 1.0

I'm not surprised to hear about this. Although we support a 64bit build, I have my doubts about our ability to handle HUGE files (which require 64 bit offsets). Regrettably, nothing can be done about this at the moment as I have lots to finish for v0.26.

#3 Updated by LaserSoft Imaging over 1 year ago

The fix is proposed in the following forum topic: http://dev.exiv2.org/boards/3/topics/2528

#4 Updated by Robin Mills over 1 year ago

  • Status changed from New to Resolved
  • Assignee set to Robin Mills
  • Target version changed from 1.0 to 0.26
  • % Done changed from 0 to 80
  • Estimated time set to 1.00

Thanks, George. Thanks for reporting this and providing the patch. Submitted. r4269.

I didn't notice the Forum topic with the patch. Can I ask you to build the trunk and verify that this has been fixed.

I have opened an Issue for HUGE file support #1176. In this case, we've dodged the 31/32 bit bullet. We won't be so lucky when we work on files > 4GB. I'm sure that will take a considerable effort to acquire all the test files, then test and fix all the code. It'll be tedious, however it will be increasingly important, especially for video support.

#5 Updated by Robin Mills about 1 year ago

  • Status changed from Resolved to Closed
  • % Done changed from 80 to 100

#6 Updated by LaserSoft Imaging 3 months ago

Our software is still suffering from this issue. Now we might have a (partial) solution for issue #1175.
By debugging readMetadata we found out that the size()-method return "-1" for large files (>2GB). Applied to mappedLength this value has been transformed into something very large since mappedLength is of unsigned type. So we implemented two changes in the Exiv2-sources:

1. Within Methode FileIo::Impl::stat(StructStat& buf) we make use of 64 bit WinAPI methods like this:

#ifdef EXV_UNICODE_PATH
        if (wpMode_ == wpUnicode) {
#ifdef _WIN64
            struct _stat64 st;
            ret = ::_wstati64(wpath_.c_str(), &st);

            if (0 == ret) {
                buf.st_size = st.st_size;
                buf.st_mode = st.st_mode;
                buf.st_nlink = st.st_nlink;
            }
#else
            struct _stat st;
            ret = ::_wstat(wpath_.c_str(), &st);

            if (0 == ret) {
                buf.st_size = st.st_size;
                buf.st_mode = st.st_mode;
                buf.st_nlink = st.st_nlink;
            }
#endif

It turned out, that for file sizes >2GB _stat and _wstat have are not capable of returning sufficient file information. Thus on 64 bit compilations we switched to *64-methods.

2. For all the size() methods we changed the return value from from long to size_t since this type is capable of storing correct sizes.

With these two changes applied the Exiv2-library is able to store IPTC data in files larger than 2GB. We will deploy our software with this fix. Please consider checking the solution and also include it into your next version.

#7 Updated by Robin Mills 3 months ago

  • Estimated time changed from 1.00 to 2.00

Thank you for the patch which I have submitted as r4755. It will be included in Exiv2 v0.26 which I hope will be released in the next few days.

The top-ranked item for v0.27 in http://dev.exiv2.org/projectns/exiv2/news is #992 - Better raw file support and test. Although the exiv2 code compiles and executes in 64 bit, the handling of HUGE files (>2GB) hasn't had the test and stressing it deserves. I'm sure there are numerous place where 64 bit values have been converted to 32 and back to 64 bit. This will go un-noticed for many files, however this must be fixed for HUGE files.

I'm not sure if/when this matter will get attention. Open source relies on volunteers. Regrettably, nobody has volunteered to work in this area.

#8 Updated by LaserSoft Imaging 3 months ago

Robin Mills wrote:

Thank you for the patch which I have submitted as r4755. It will be included in Exiv2 v0.26 which I hope will be released in the next few days.

Thanks for the quick reaction. Just reviewed the mentioned revision r4755: Please also consider the scond part of the suggested fix (changing size() return values to size_t). Since long is signed, file sizes might exceed it and change to negative values.

#9 Updated by Robin Mills 3 months ago

Christoph

Thanks for drawing my attention to your second point. I'm happy to change size() to return size_t provided it doesn't cause a ripple of change in the code. This is another example of the work that is required to certify our handling of HUGE files.

Robin

#10 Updated by Robin Mills 3 months ago

  • Estimated time changed from 2.00 to 3.00

r4756 changes BasicIo::size() to return size_t. The test suite runs successfully without change.

This change contains the "code ripple" to basicio.{cpp|hpp} and iotest.cpp. It enables several casts to be removed from the code. It adds one cast in a comparison of size() and tell(). Because tell() returns a long which can be -1, this new/additional cast is unavoidable.

I am rather nervous about this change as it could be the final change for Exiv2 v0.26. The evidence of lack of ripple and removal of casts gives me the courage to commit this.

Christoph:
Please run your own test suite with r4755 and r4756 compiled into your product. If anything arises, I will revert r4755 and r4756 and reconsider/defer this matter for v0.27.

#11 Updated by LaserSoft Imaging 3 months ago

I'm afraid this issue is not yet solved... :(

I have to say that we first implemented the fix on an older state of exiv2 sources. In order to also test the other changes from your revision we updated the sources via subversion and applied the fix for this ticket.

Now we run into another error when handling large files when attempting to open these with Exiv2::ImageFactory::open:

Exiv2 Error 11: The file contains data of an unknown image type

We also tested the new sources without the fix and still this error appears. So at least the error has not been introduced by revision r4755 or r4756

#12 Updated by Robin Mills 3 months ago

I'm not sure what to say about this. I think you are saying that HUGE file support in Exiv2 is broken. There may be times when you will be lucky and be successful when operating on a HUGE file. However, there are really a collection of different issues involved.

1) basicio.cpp and http.cpp require thorough inspection and remedial work to be certain they work on HUGE files
2) the exiv2 code that reads files (such as tiffimage.cpp) should be thoroughly tested on HUGE files
3) we have an outstanding request to support BigTIFF (the 64 bit variant of Tiff) #1186

I'm wondering how/if I will have time to work on this. I am a 66 year old volunteer. Here's the "TODO" list for v0.27 http://dev.exiv2.org/news/3 and is perhaps 2 or 3 years of work.

I attended a conference in Rio on Saturday to appeal to the Linux Community for help. https://www.youtube.com/watch?v=3Fv57Lbhmqg

If you'd like to be involved with fixing our HUGE file support, I will mentor and help you.

Also available in: Atom PDF

Redmine Appliance - Powered by TurnKey Linux