Project

General

Profile

Bug #1043

pyexiv2 fails on cifs shares on an Ubuntu client

Added by thoralf schulze over 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Category:
basicio
Target version:
Start date:
19 Mar 2015
Due date:
% Done:

100%

Estimated time:

Description

hi,

i got referred here by olivier tilloy, the maintainer of pyexiv2 - the python bindings for exiv2.¹

while playing around with pyexiv2, i ran into a strang bug when writing metadata with pyexiv's write.metadata() method. this method is just a really thin wrapper around exiv2's writeMetadata(), so the issue at hand might actually be caused by exiv2 itself.

okay, on to the bug: writing exif metadata to a jpeg file hangs if and only if:
- the file to be written resides on a cifs share mounted from a win2k8 server
- the jpeg file is larger than ~100kb

everythings works flawlessly if:
- $file is bigger than 100k and resides on a cifs share served by a samba server or a local file system
- $file is smaller than 100k and resides on either cifs share
- $file is bigger than 100k, resides on a cifs share served by the infamous win2k8 server and the python process executing the script is being strace'd
- $file is bigger than 100k, resides on a cifs share served by the infamous win2k8 server and the share is mounted with mount.cifs' "cache=none"-option

please let me know if there is anything i can do to help to resolve this issue

¹ - please see https://bugs.launchpad.net/pyexiv2/+bug/1433747 for details


Files

Anniversary.jpg (67.5 KB) Anniversary.jpg Robin Mills, 24 Mar 2015 12:05
T1043.patch (4.23 KB) T1043.patch Thomas Beutlich, 24 Mar 2015 22:14
diff_thoralf.diff (3.05 KB) diff_thoralf.diff thoralf schulze, 30 Mar 2015 18:16

Related issues

Related to Exiv2 - Bug #1042: Exiv2 nulls file on CIFS share when modifying Exif.Photo.UserCommentClosed15 Mar 2015

Actions

Associated revisions

Revision 3627 (diff)
Added by Robin Mills over 6 years ago

#1042 and #1043. Don't use a MemIo object for small temporary files.

Revision 3630 (diff)
Added by Robin Mills over 6 years ago

#1043 and #1042. Thanks to Thomas for showing that r3627 reintroduced #812. Thanks to Thoralf for suggesting msync MemIo fix.

Revision 3631 (diff)
Added by Robin Mills over 6 years ago

#1042 #1043 #812 Added test regression detectors

Revision 3635 (diff)
Added by Robin Mills over 6 years ago

#1043 #1042 #812. Thank You to Thomas for this "polishing" patch. Thank you to everybody who has worked on this issue. Adding all the comments on the three issues together comes to about 60 items by at least 6 contributors. And it involves platform issues, networking, Linux and Windows APIs. One of the most complex issues to arise in Exiv2. Well done everybody. And we've dealt with this quickly. Only 9 days since Calvin first reported #1042.

I use the term "complex" to mean many threads of technology. "complex" != "complicated". "complicated" = "difficult to understand". We try to avoid "complicated".

History

#1

Updated by Robin Mills over 6 years ago

  • Category set to metadata
  • Status changed from New to Assigned
  • Assignee set to Robin Mills
  • Target version set to 0.26

Thanks for reporting this. This seems to be a duplicate of 1042 which is currently being investigated. Your server is Windows. On what platform are you using pyexiv2?

You may be interested to learn that Vincent Vyvre has released a version of pyexiv2 for python3. http://dev.exiv2.org/boards/3/topics/1886

I use pyexiv2 a lot. Almost all 50,000 photographs on my web site are processed by a python script. I'm sympathetic that the underlying issue is in libexiv2, however you'll understand that I'm not going to get involved with supporting pyexiv2.

I strongly suspect that this is a Samba issue. We normally use memory mapping to perform I/O, however this can be disabled and then we use the Standard "C" fopen/fread/fseek/ftell/fclose library calls. I built the library without memory mapping, and this hang issue occurred on Linux.

Getting to the bottom of this isn't going to be easy if this is an issue in Samba. I had a google into Samba/Ubuntu 14.04. Gosh, there are a lot of folks complaining about Samba4.

#2

Updated by thoralf schulze over 6 years ago

hi Robin,

thank you for your time and your efforts … not sure if my email reply made it into redmine, so i'll post it here again:

Your server is Windows. On what platform are you using pyexiv2?

we are using pyexiv2 on a debian wheezy box. both pyexiv2 and libexiv2
are straight from the debian repositories, libexiv2 is 0.23-1. the same
applies to cifs-utils …
everything is fine when pyexiv2 is writing to a share provided by a
samba4 server - so yeah, the root cause could well be some subtle
differences in the cifs implementation between samba and windows.

I use pyexiv2 a lot. Almost all 50,000 photographs on my web site
are processed by a python script. I'm sympathetic that the
underlying issue is in libexiv2, however you'll understand that I'm
not going to get involved with supporting pyexiv2.

fair enough :-)
i can reproduce the issue here quite reliably, and it occurs with exiv2,
too:

KMT\adlibworker@stone:~$ mount | grep win2k
//shatner.server.kinemathek.de/images on
/home/adlibworker/win2k-share-with-cache type cifs
(rw,relatime,vers=1.0,cache=strict,username=adlibworker,domain=KMT,uid=10047,forceuid,gid=10000,forcegid,addr=172.20.1.61,file_mode=0755,dir_mode=0755,nounix,serverino,rsize=61440,wsize=65536,actimeo=1)
//shatner.server.kinemathek.de/images on
/home/adlibworker/win2k-share-without-cache type cifs
(rw,relatime,vers=1.0,cache=none,username=adlibworker,domain=KMT,uid=10047,forceuid,gid=10000,forcegid,addr=172.20.1.61,file_mode=0755,dir_mode=0755,nounix,serverino,rsize=61440,wsize=65536,actimeo=1)
KMT\adlibworker@stone:~$ ls al keksrezept*
-rw-r--r-
1 KMT\adlibworker KMT\domain users 489057 Mar 19 13:08
keksrezept.jpg
rw-r--r- 1 KMT\adlibworker KMT\domain users 30254 Mar 19 13:08
keksrezept-small.jpg
KMT\adlibworker@stone:~$ pwd
/home/adlibworker
KMT\adlibworker@stone:~$ cp keksrezept* win2k-share-with-cache/ && cp
keksrezept* win2k-share-without-cache/
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666"
win2k-share-without-cache/keksrezept.jpg
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666"
win2k-share-without-cache/keksrezept-small.jpg
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666"
win2k-share-with-cache/keksrezept-small.jpg
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666"
win2k-share-with-cache/keksrezept.jpg

the latter hangs until it is SIGKILLed, strace'ing it does not yield
anything - same symptoms as with pyexiv2. exiv2 ist from the wheezy
repositories.

Getting to the bottom of this isn't going to be easy if this is an
issue in Samba. I had a google into Samba/Ubuntu 14.04. Gosh, there
are a lot of folks complaining about Samba4.

i am more than willing to try the procedure above again, and i am more
or less comfortable with compiling exiv2 or other binaries myself …

thank you for your time & with kind regards,
t.

#3

Updated by Robin Mills over 6 years ago

Thanks for all this effort.

The build is straight forward and documented here: http://dev.exiv2.org/projects/exiv2/wiki/How_do_I_build_Exiv2_on_the_XYZ_platform

This on-liner should do the trick:

$ svn export svn://dev.exiv2.org/svn/trunk ; cd trunk ; make config ; ./configure ; make -j4 ; sudo make install ; make samples ; make tests
Should take less than 2 minutes (it's about 100k lines of code). At the moment, I haven't had an idea about what to test. I/O in Exiv2 is performed in basicio.cpp. We have an abstract I/O class from which we derive concrete classes for File/Mem/Http etc:
$ grep class ~/gnu/exiv2/trunk/src/basicio.cpp 
// class member definitions
    //! Internal Pimpl structure of class FileIo.
    class FileIo::Impl {
    }; // class FileIo::Impl
    //! Internal Pimpl structure of class MemIo.
    class MemIo::Impl {
    }; // class MemIo::Impl
            // call super class method
    //! Internal Pimpl abstract structure of class RemoteIo.
    class RemoteIo::Impl {
    }; // class RemoteIo::Impl
    //! Internal Pimpl structure of class HttpIo.
    class HttpIo::HttpImpl : public Impl  {
    }; // class HttpIo::HttpImpl
    //! Internal Pimpl structure of class RemoteIo.
    class CurlIo::CurlImpl : public Impl  {
    }; // class RemoteIo::Impl
    //! Internal Pimpl structure of class RemoteIo.
    class SshIo::SshImpl : public Impl  {
    }; // class RemoteIo::Impl
We have an I/O test program in our test suite (samples/iotest.cpp). I'm going play with that. I didn't write Exiv2 - my good buddy Andreas did that - and has done a great job. I'll have some fun investigating this one. I've already built without MemIO, so I don't really suspect him. Calvin's given me the command to mount a Windows drive from Linux. I'm going to investigate reading/writing from Linux on the Samba/Windows drive.

It's remarkable that you and Calvin have shown up on consecutive days talking about this. Most of the I/O code has been in service for years without anything like this being mentioned before. It'll fun to find this and say "Ah........ who would have thought ......... of course!"

#4

Updated by Robin Mills over 6 years ago

I've done some work on this. No result - however progress.

  1. I've persuaded smb4k to connect Linux to Windows
  2. I've reproduced the hang Ubuntu <-> Windows 7 Ultimate
  3. I've used exiv2/samples/iotest over Ubuntu <-> Windows 7 and Ubuntu <-> Ubuntu

samples/iotest makes two copies of a file, one with FileIO and one with MemIO. It's working fine.

1042 rmills@rmillsmm-kubuntu:~/smb4k/RMILLS-LAPTOP/C/temp $ ls -alt Fonts.zip
-rwxr-xr-x 1 rmills rmills 64143832 Mar 19 20:18 Fonts.zip
1043 rmills@rmillsmm-kubuntu:~/smb4k/RMILLS-LAPTOP/C/temp $ ~/gnu/exiv2/trunk/bin/iotest Fonts.zip P.zip Q.zip
1044 rmills@rmillsmm-kubuntu:~/smb4k/RMILLS-LAPTOP/C/temp $ ls -alt Fonts.zip P.zip Q.zip
-rwxr-xr-x 1 rmills rmills 64143832 Mar 19 20:26 Q.zip
-rwxr-xr-x 1 rmills rmills 64143832 Mar 19 20:26 P.zip
-rwxr-xr-x 1 rmills rmills 64143832 Mar 19 20:18 Fonts.zip
1045 rmills@rmillsmm-kubuntu:~/smb4k/RMILLS-LAPTOP/C/temp $ 
1045 rmills@rmillsmm-kubuntu:~/smb4k/RMILLS-LAPTOP/C/temp $ md5sum Fonts.zip P.zip Q.zip
46559156b6ffb2c4777a0eaa4b039110  Fonts.zip
46559156b6ffb2c4777a0eaa4b039110  P.zip
46559156b6ffb2c4777a0eaa4b039110  Q.zip
1046 rmills@rmillsmm-kubuntu:~/smb4k/RMILLS-LAPTOP/C/temp $ 
All good news. At least "low level" I/O seems to working correctly.

Calvin's original script is:

for i in `ls`; do exiv2 -v -M"set Exif.Photo.UserComment charset=Ascii New Exif comment3" $i; done
It's running in a loop, so every iteration invokes a new process. So even if we're being careless about closing files/memory - the system "should be" cleaning up on our behalf. I've tried adding a little sleep between the "runs":
for i in `ls`; do exiv2 -v -M"set Exif.Photo.UserComment charset=Ascii New Exif comment3" $i; sleep 1; done
Runs more slowly of course, however it hangs after about 50 files.

I've run this from Windows7 to the Ubuntu server:

P:\home\rmills\temp\exiv2test>for %i in (*.jpg) do exiv2 -v -M"set Exif.Photo.UserComment charset=Ascii New Exif comment3" %i

Tomorrow I will instrument and the trace all the I/O calls. The clear suspect is: exiv2+Linux-client. The Mac and Windows clients appear fine. All the servers appear fine. And we know the exiv2 file and memio classes are solid. We just keep digging for that "Ah" moment.

Incidentally, I discovered why I couldn't connect smb4k to Windows. It was smb4k. It wouldn't connect to Ubuntu (although that had been working). The /var/lib/samba/private directory was missing! Gone! Disappeared! I recreated it with smbpasswd and everything worked.

Guys, if you have any ideas - please share. I've added Calvin as a watcher for this issue and intend to discuss both 1042 and 1043 in this thread.

#5

Updated by Calvin Browne over 6 years ago

Ok, I've tested the following:

XP SMB client <-> Ubuntu 14.04.2 LTS on just over 4000 files with a similar windows batch file.
2 of the 3 binaries you made available (trunk and .23)
not reproducible.

On Ubuntu 14.04.2 LTS NFS Client to NFS Server on just over 4000 file
My compiled trunk and standard ubuntu version "exiv2 -V
exiv2 0.23 001700 (64 bit build)"
not reproducible.

So this definitely points to a CIFS client issue on Ubuntu.

Unfortunately, I'm off to Germany on Sunday for WHD and then some leave, so it's going to be 2 weeks before I can start getting into that.

In the meantime - maybe a readme in the release notes, with a recommendation on nfs over cifs? When I get back I can also try an earlier ubuntu version, assuming it has an older cifs client.

#6

Updated by Robin Mills over 6 years ago

  • Target version changed from 0.26 to 0.25

I've found and submitted a fix. r3725

I'm going to assign this to Andreas to review. I haven't studied the MemIo class enough to know why it's not working correctly on smb/Ubuntu.

Andreas: If you're pushed for time, please assign this back to me and I will continue to investigate.

#7

Updated by Robin Mills over 6 years ago

  • Subject changed from pyexiv2 fails on cifs shares served by win2k8-servers to pyexiv2 fails on cifs shares on an Ubuntu client

I'm going to modify the title of this issue. We've eliminated the Windows Server in our investigation. The clear candidate for investigation is exiv2 running on Ubuntu on a Samba share (cifs).

#8

Updated by Robin Mills over 6 years ago

  • Assignee changed from Robin Mills to Andreas Huggel
#9

Updated by thoralf schulze over 6 years ago

hi robin,

thank you for having a look! i can confirm that your patch resolves the issue:

KMT\adlibworker@stone:~$ which exiv2
/usr/local/bin/exiv2
KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -V | head -n5
exiv2 0.24 001800 (64 bit build)
Copyright (C) 2004-2013 Andreas Huggel.

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
KMT\adlibworker@stone:~$ cp keksrezept* win2k-share-without-cache/
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666" win2k-share-without-cache/keksrezept.jpg
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666" win2k-share-without-cache/keksrezept-small.jpg
KMT\adlibworker@stone:~$ cp keksrezept* win2k-share-with-cache/
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666" win2k-share-with-cache/keksrezept-small.jpg
KMT\adlibworker@stone:~$ exiv2 -M"set Exif.Photo.PixelXDimension 666" win2k-share-with-cache/keksrezept.jpg
KMT\adlibworker@stone:~$

so, not using mmap'ed temporary files does not trigger the bug … interesting. i also tried with this:

thoralf@tfbb:~/Baustelle/exiv2/trunk$ diff ub src/basicio.cpp{.orig,}
--
src/basicio.cpp.orig 2015-03-23 12:40:37.672506212 0100
++ src/basicio.cpp 2015-03-23 13:24:45.273634955 +0100
@ -584,7 +584,8 @
// buffer, which is a workaround to ensure that the links don't get broken.

// #1042 and #1043.  Always use a FileIo object to avoid Samba/Ubuntu/cifs error
- ret = 911;
+ // try with msync()
+ // ret = 911;
if (ret != 0 || (buf.st_size > 1048576 && nlink == 1)) {
pid_t pid = ::getpid();
std::auto_ptr&lt;FileIo&gt; fileIo;
@ -1177,6 +1178,7 @
MemIo::~MemIo() {
if (p_->isMalloced_) {
+ msync(p_, p_->size_, MS_ASYNC);
std::free(p_->data_);
}
delete p_;
thoralf@tfbb:~/Baustelle/exiv2/trunk$

… which also seems to work. however, i do not speak c(++) and wouldn't trust this code to not set my house on fire or eat my babies. fortunately, i have neither :-)
i'm also not sure if the unconditional msync doesn't defeat the purpose of using mmap'ed files in the first place.

Unfortunately, I'm off to Germany on Sunday for WHD and then some leave

say hello if you are in berlin - beers will be on me :-)

thank you very much & with kind regards,
t.

#10

Updated by Thomas Beutlich over 6 years ago

  • Priority changed from Normal to High

I consider the 911-change (r3627) as a temporary hack only because it breaks #812 on Win again.

Also, msync is Linux only and won't compile on MSVC. Hence, compiler switches are needed for later. Looks like msync and reverting the 911-hack could be the proper solution. However I am not Linux expert.

#11

Updated by Thomas Beutlich over 6 years ago

  • Category changed from metadata to basicio
#12

Updated by Robin Mills over 6 years ago

  • Assignee changed from Andreas Huggel to Robin Mills
  • Priority changed from High to Normal

Thomas

I thank you for bringing my attention to #812. I've assigned this issue (#1043) back to myself to consider your observation, although I don't know what #812 is about as you haven't bothered to say. You are right to have changed the category from metadata (where the issue surfaced) to I/O where the issue lies.

It is not up to you to assign priority to bugs. Your behaviour around the Exiv2 project is very demotivating. If you wish to help, please be constructive and cooperative. You have consumed a considerable amount of my time over the last few months and seldom bothered with the courtesy to say "Thank You".

Robin

#13

Updated by Robin Mills over 6 years ago

Thanks You Thoral for your suggestion about msync(). I doubt if it would eat your (non-existing) babies.

In view of Thomas observation that the 911 fix causes #812 to resurface on Windows, I'll have to read the code with more care to understand this more deeply. I did ask Andreas to review this fix as I believe he wrote the code. I see he was involved in #812 which involves erroneously "emptying" files which have multiple hard links.

My thoughts are that I should:

  1. Study #812 to understand what's going on
  2. Add to test/bugfixes-test.sh to detect #812
  3. Perhaps MemIo can never be used to rewrite a file nlinks > 1 on any platform

Specifically, about msync():

  1. munmap()/mmap() may be portable and have the same effect as msync()
  2. munmap()/mmap() may cause the MemIo encapsulated pointer to change (hopefully without side-effects)

Ah...... Its often tough to deal with other folks code. However getting to the bottom of these matters can be satisfying. For sure, we've made considerable progress in one week.

If you get to meet Calvin, enjoy your beer. If Thomas can also join you, so much the merrier. "Cheers" from Merry England.

#14

Updated by Thomas Beutlich over 6 years ago

#812 is about destroying hard links after writing metadata to file. There is a link to a 3 year old discussion inside #812 but you don't need to follow it all through. Just see the comment above your 911-commit in basicio.cpp, lines 583 and 584:

// use a memory buffer. I.e., files with hard links always use a memory
// buffer, which is a workaround to ensure that the links don't get broken.

This does no longer hold after r3627. Thus hot-fixing this ticket broke another one.

One more sentence about constructive input: If you read my previous comment again you'll see that I pointed to msync as a possible solution fixing this one and keeping #812 closed.

#15

Updated by Thomas Beutlich over 6 years ago

One more thought about #812. I guess it's not purely related to Windows and should also apply to Linux. And of course, you are right about adding the missing test case.

Thank you, Robin.

#16

Updated by Robin Mills over 6 years ago

Thomas

I think there has been a misunderstanding. You have been very helpful by pointing out that the 911 fix breaks the fix to #812. However increasing the priority of the bug amounts to you shouting at me. Please don't do that.

You are probably correct that protecting the msync() call with #if HAVE_MSYNC .. #endif is a good fix. I feel I must add an #812 detector to the test suite (in case it is erroneously reintroduced). And I will need to add an HAVE_MSYNC detector to both the autotools and CMake build environments. We know it's not available on MSVC. I don't know its status on Cygwin and MInGW.

I do appreciate your input and help. I will spend more time on this.

Robin

#17

Updated by Robin Mills over 6 years ago

Oh, I've thought of something else. I'll confess: "I didn't read the comment about the hard links!". Or, I glanced at it without realising the implication. I was so focused on the size condition that I thought "what's size go to do with it?" (as Tina Turner once sang).

So, I thought res = 911; makes it very clear that we're always gonna head down the FileIo route and never risk MemIo. And then I thought "I'll dump this on Andreas to review". However he's rather dormant on the project at the moment and we're going to have to live without him.

The good news is that the team of Thomas, Thoralf, Calvin and Robin seem to be pulling together to fix this. Well Done Everybody. Sounds like Thoralf is offering to buy everybody beer in Berlin one night this week. It's my Wedding Anniversary on Friday (41 happy years). I'll toast your health with a glass of English Bitter if we've nailed this issue by the end of the week.

#18

Updated by Thomas Beutlich over 6 years ago

Just checked: msync is in sys/mman.h of Cygwin. Mingw should be also positive. Can check tomorrow.

#19

Updated by Robin Mills over 6 years ago

Thanks, Thomas.

I expect them to be there. However, I'm updating configure.ac to test for itself and set HAVE_MSYNC. You never know, there might be an old Cygwin (or worse, a future Cygwin) that doesn't include it. Or when compiling with Clang. Or an unsupported platform such as Sun or AIX which may, or may not, have msync(). Better to leave autotools/CMake to test-link the the function to be sure it's on the build platform. And I'll update src/version.cpp which not reports those flags when your run $ exiv2 --verbose --version

I'm hoping to fix this before going to bed. Adding something to the test suite will be done later. I won't mark this issue "Resolved" until I have something in the test suite to detect #812.

#20

Updated by Thomas Beutlich over 6 years ago

There are already detectors for EXV_HAVE_SYS_MMAN_H, EXV_HAVE_MMAP and EXV_HAVE_MUNMAP.

Unfortunately I cannot answer Thoralfs question if unconditional msync doesn't defeat the purpose of using mmap'ed files.

If it proves to be the solution what about moving msync from dtor of MemIO to MemIo::munmap() or MemIo::close() which are a dummy function right now?

#21

Updated by Robin Mills over 6 years ago

I'm not sure either about Thoralf's question about msync defeating using mmap'ed file.

I'm focused on adding msync() and a detector. Exactly where we put the msync() call isn't too important - it should be protected by #if HAVE_MSYNC ... #endif. Maybe I will should add a method MemIo.msync() and hide the detector there.

However .... it's getting late in Germany ... you go to bed and I'll finish the autotools/CMake/version.cpp code and test it tonight. I'll manually test nlinks > 1 on the Mac. Adding to test/bugfixes-test.sh isn't getting done tonight.

#22

Updated by Robin Mills over 6 years ago

OK, guys. Fix submitted. r3630

Thanks to everybody for getting involved on this - especially Thomas. Well done, Thomas: you pushed me to realise my error. Thanks.

I hope we're really done on this. I've checked on Mac and Linux that we're not destroying files with multiple hard links.

$ exiv2 -v -M"set Exif.Photo.UserComment charset=Ascii New Exif comment3" R.jpg
File 1/1: R.jpg
Set Exif.Photo.UserComment "charset=Ascii New Exif comment3" (Comment)
$ ln R.jpg Q.jpg
$ exiv2 -v -M"set Exif.Photo.UserComment charset=Ascii changed the comment" R.jpg
File 1/1: R.jpg
Set Exif.Photo.UserComment "charset=Ascii changed the comment" (Comment)
$ exiv2 -pa -g UserComment R.jpg
Exif.Photo.UserComment                       Undefined  27  changed the comment
$ exiv2 -pa -g UserComment Q.jpg
Exif.Photo.UserComment                       Undefined  27  changed the comment
Thomas can check this on Windows. I've seldom used the obscure NTFS command MKLINK (much better name than Unix's horrid ln command). I wonder if hard links work on Cygwin or MinGW (probably will if NTFS).

And, we got lucky. The macro MS_SYNC is a companion of msync(), so I didn't need to update the build environment. I'll check the build server tomorrow to be sure that this has build OK on all platforms (perhaps MSVC has a different MS_SYNC macro!).

If nobody says "not fixed" then I'll add something to ./test/bugfixes-test.sh later this week and update this issue to resolved.

#23

Updated by Robin Mills over 6 years ago

Submitted regression detectors: r3631. Built and tested on all supported platforms: http://exiv2.dyndns.org:8080/job/Exiv2-trunk/1884/

I hope this is the last time I hear about #1043 and #812. Thanks Everybody: Thoralf, Calvin and Thomas for your contributions. I hope (in the nicest possible way) that this is the last time we talk about this. However, when you discover other issues, we'll meet again. Beers in Berlin on Friday Night - 2015-03-27 - to toast Exiv2 and my 41st Wedding Anniversary. I've got a date in England and can't attend.

(Whoops, the photo in Arizona was on our 40th Anniversary last year: 2014-03-27).

#24

Updated by Thomas Beutlich over 6 years ago

Thank you, Robin.

Just a quick comment for now: I guess there is once call of msync() missing in function MemIo::transfer in case memIo!=this.

#25

Updated by Robin Mills over 6 years ago

Do you mean:

    void MemIo::transfer(BasicIo& src)
    {
        MemIo *memIo = dynamic_cast<MemIo*>(&src);
        if (memIo) {
            // Optimization if src is another instance of MemIo
            if (true == p_->isMalloced_) {
                std::free(p_->data_);
            }
            p_->idx_ = 0;
            p_->data_ = memIo->p_->data_;
            p_->size_ = memIo->p_->size_;
            p_->isMalloced_ = memIo->p_->isMalloced_;
            memIo->p_->idx_ = 0;
            memIo->p_->data_ = 0;
            memIo->p_->size_ = 0;
            memIo->p_->isMalloced_ = false;
        }
        else {
            // Generic reopen to reset position to start
            if (src.open() != 0) {
                throw Error(9, src.path(), strError());
            }
            p_->idx_ = 0;
            write(src);
            src.close();
        }
        if (error() || src.error()) throw Error(19, strError());
    }

Should be:
...
        if (memIo) {
            // Optimization if src is another instance of MemIo
            if (true == p_->isMalloced_) {
                msync();
                std::free(p_->data_);
            }
...

#26

Updated by Thomas Beutlich over 6 years ago

Some more ideas:
  • MemIo::munmap() could return the return value of MemIo::msycn().
  • No need to have MemIo::msycn() virtual. Could be even private.
#27

Updated by Thomas Beutlich over 6 years ago

Robin Mills wrote:

Do you mean:[...]
Should be:[...]

Yes, exactly.

#28

Updated by Thomas Beutlich over 6 years ago

Important question is also if msync() must be called before FileIO::munmap is called. According to msync documentation ist should be called. Unfortunately I am not a Linux API expert.

#29

Updated by Thomas Beutlich over 6 years ago

It was confusing to me that msync doc refers to mmap/munmap which is not used in MemIO but FileIO.

This one is interesting: http://stackoverflow.com/questions/5902629/mmap-msync-and-linux-process-termination

msync is used case of mmap/munmap = FileIO. For MemIO class we could try fsync.

#30

Updated by Robin Mills over 6 years ago

I'm a little lost here, Thomas. Where is this "if msync() must be called before FileIO::munmap is called" ?

I call the msync() method in MemIo:munmap() in r3630. And MemIo::msync() could be declared private and does not need to be virtual. However I don't see anything incorrect in what's been submitted.

And Thomas. I have another life. I can't endlessly explore and test every possibility. It appears to be building and working fine on all supported platforms. I have Daniel moaning at me about CMake/MSVC. Gilles discussing the webp GSoC 2015 project. Alan wants something about Lightroom Categories. And I have 14 other oustanding issues to deal with for v0.25. Oh, and more jenkins/build-server stuff to finish. And you how many volunteers do I have for all those tasks.

And I'm rather puzzled that the Windows Memory Mapping appears to be used in the FileIo class. I didn't write it and don't have time to deal with all of this.

#31

Updated by Robin Mills over 6 years ago

Thomas. Please provide a patch with all your suggested changes. I'll review and submit. However I can't endlessly work on this. I have other things to get on with - including my life!

#32

Updated by Thomas Beutlich over 6 years ago

I'm a little lost here, Thomas. Where is this "if msync() must be called before FileIO::munmap is called" ?

This ought to be a question. It is taken from here: http://man7.org/linux/man-pages/man2/msync.2.html

msync() flushes changes made to the in-core copy of a file that was
mapped into memory using mmap(2) back to the filesystem. Without use
of this call there is no guarantee that changes are written back
before munmap(2) is called.

Currently we don't do it in FileIO. That's why I asked the Linux API experts.

And I'm rather puzzled that the Windows Memory Mapping appears to be used in the FileIo class. I didn't write it and don't have time to deal with all of this.

Yes, for sure it is there. This is the usual Win-API programming compared to Linux API mmap/munmap. No need to bother with this.

Please provide a patch with all your suggested changes. I'll review and submit. However I can't endlessly work on this. I have other things to get on with - including my life!

I can try. But I cannot test if it fixes this ticket.

#33

Updated by thoralf schulze over 6 years ago

hi,

Thomas Beutlich wrote:

[…] But I cannot test if it fixes this ticket.

i have a working build environment for linux amd64 here, and i am more than willing to keep it around for a little while. i didn't use the test suite yet, but am fairly confident to be able to figure it out … i won't be able to check robins latest checkin today, but will do so tomorrow.

Robin Mills wrote:

Beers in Berlin on Friday Night - 2015-03-27 - to toast Exiv2 and my 41st Wedding Anniversary.

sounds good to me, my offer "free beer to anyone involved" still holds :-) for the berliners, i'd suggest the bierkombinat kreuzberg, ~8 pm? robin, we'll have a toast on you and you anniversary.

with kind regards,
t.

#34

Updated by Robin Mills over 6 years ago

Thomas.

The build server builds and tests the code when it's submitted. http://exiv2.dyndns.org:8080

When you give me your patch, I'll review and submit it. If it builds and passes the test suite on the build server, we're done! However I can't spend any more time on this.

Thoralf

Thanks for your input. If Thomas comes for beer on Friday, don't let him discuss this issue (or his other contributions) all night. Drown him in beer!

Robin

#35

Updated by Thomas Beutlich over 6 years ago

Find attached the trivial patch (based on r3630) I can offer. It is nothing special about it at all.

#36

Updated by Robin Mills over 6 years ago

Patch submitted. r3633. Thank You Thomas for providing the patch.

I will keep myself as the "Assignee" on this issue and check the build server tomorrow. If everything's fine (which I expect it to be), I'll mark this "Resolved" and change the Assignee to Thomas to recognise his important contribution. However this has been a real team effort to fix this. Thank You everybody.

#37

Updated by Thomas Beutlich over 6 years ago

Yes, builds should likely be fine. But tests on this issue are more interesting. Thoralf can help here and should be the one credited for adding msync.

#38

Updated by thoralf schulze over 6 years ago

hi,

i checked r3633. unfortunately, it exhibits the same symptoms as versions before r3627 - it hangs on large files, does work fine while being strace'd etc.
i also double-checked with a fresh checkout of r3627, which still works fine.

sorry for the bad news, please don't kill the messenger …

with kind regards,
t.

#39

Updated by Robin Mills over 6 years ago

Thanks for testing and reporting this. We have to know about this sooner or later. Much better to know sooner.

#40

Updated by Thomas Beutlich over 6 years ago

Thoralf, can you please repeat the test where you replace MS_SYNC by MS_ASYNC in basicio.cpp. This is the only difference I can think about.

#41

Updated by thoralf schulze over 6 years ago

hi,

Thomas Beutlich wrote:

Thoralf, can you please repeat the test where you replace MS_SYNC by MS_ASYNC in basicio.cpp. This is the only difference I can think about.

i did - same issue, hangs as well. i also did another test with r3627 and the msync()-patch applied, which failed in the same way this time :-(

so msync seems to have been a red herring, and i feel rather stupid now.

sorry,
t.

#42

Updated by Thomas Beutlich over 6 years ago

Thanks for testing. Very strange observations.

I will try to debug it once more to see how FileIO and MemIO work. I won't be able to do it today.

#43

Updated by Robin Mills over 6 years ago

Gentlemen:

Are you sure this is a problem. It seems to be working OK for me. Here's the setup:

  • 7mb JPG - with/without links.
  • client is Ubuntu (machine = rmillsmbp-kubuntu)
  • directory is mounted with cifs on rmillsmm-kubuntu (running 14.10)
  • exiv2 built on head of trunk. basicio.cpp is r3635
Tests:
  1. Run a loop of 100 write/reads on foo.jpg. Result OK
  2. hard-link P.jpg and Q.jpg to foo.jpg
  3. Run a loop of 10 write/reads on foo.jpg. Result OK

Here's the evidence:

$ uname -a
Linux rmillsmbp-kubuntu 3.13.0-46-generic #79-Ubuntu SMP Tue Mar 10 20:06:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ svn info ~/gnu/exiv2/trunk/src/basicio.cpp | grep Rev
Revision: 3644
Last Changed Rev: 3635

$ exiv2 -vV -g svn -g date -g time
exiv2 0.24 001800 (64 bit build)
date=Mar 25 2015
time=22:39:42
svn=3644
have_gmtime_r=1
have_timegm=1

$ df --print-type .
Filesystem                Type 1K-blocks     Used Available Use% Mounted on
//RMILLSMM-KUBUNTU/rmills cifs  63859580 12738888  47853776  22% /home/rmills/smb4k/RMILLSMM-KUBUNTU/rmills

$ ls -alt foo.jpg
-rw-r--r-- 1 rmills rmills 7169528 Mar 25 22:42 foo.jpg

$ for i in {1..100}; do exiv2 -M"set Exif.Photo.UserComment hello $i" foo.jpg ; exiv2 -pa -g UserComment foo.jpg; done
Exif.Photo.UserComment                       Undefined  15  hello 1
Exif.Photo.UserComment                       Undefined  15  hello 2
...
Exif.Photo.UserComment                       Undefined  16  hello 99
Exif.Photo.UserComment                       Undefined  17  hello 100

$ ln foo.jpg P.jpg
$ ln foo.jpg Q.jpg

$ for i in {200..210}; do exiv2 -M"set Exif.Photo.UserComment hello $i" foo.jpg ; exiv2 -pa -g UserComment *.jpg; done     
foo.jpg               Exif.Photo.UserComment                       Undefined  17  hello 200
P.jpg                 Exif.Photo.UserComment                       Undefined  17  hello 200
Q.jpg                 Exif.Photo.UserComment                       Undefined  17  hello 200
foo.jpg               Exif.Photo.UserComment                       Undefined  17  hello 201
P.jpg                 Exif.Photo.UserComment                       Undefined  17  hello 201
Q.jpg                 Exif.Photo.UserComment                       Undefined  17  hello 201
...
foo.jpg               Exif.Photo.UserComment                       Undefined  17  hello 210
P.jpg                 Exif.Photo.UserComment                       Undefined  17  hello 210
Q.jpg                 Exif.Photo.UserComment                       Undefined  17  hello 210
$

#44

Updated by Robin Mills over 6 years ago

  • Status changed from Assigned to Resolved

As nobody has disputed my successful test, I'm going to mark this "Resolved". If you can reproduce this, please provide the evidence and I will investigate. This morning, I did consider going to Berlin for Beer this morning and decided "not this time". Our son us send £41 this morning and that'll pay for a curry in the local Indian Restaurant this evening.

#45

Updated by thoralf schulze over 6 years ago

hi,

i am able to use exiv2 / libexiv2 with mount.cifs "cache=none" option, which is a more or less workable situation here …

i also realise that mounting smb shares form a win2k8 server might not really be a commonplace endeavour. and finally, i am willing to grant shell access to the box where this issue persists to anyone who wants to investigate this issue further.

short of other ideas (and still not being the c++ guy), i sprinkled debug statements all over the place (i've been using r3637):

thoralf@tfbb:~/Baustelle/exiv2/trunk$ diff -u1 -b src/actions.cpp{.orig,} 
--- src/actions.cpp.orig        2015-03-27 17:19:02.597381466 +0100
+++ src/actions.cpp     2015-03-27 16:29:12.098552387 +0100
@@ -1215,2 +1215,3 @@
     try {
+        std::cout << "modify_run\n";
         if (!Exiv2::fileExists(path, true)) {
@@ -1248,2 +1249,3 @@
     {
+        std::cout << "modify_apply\n";
         if (!Params::instance().jpegComment_.empty()) {
@@ -1290,2 +1292,3 @@
     {
+        std::cout << "modify_addmeta\n";
         if (Params::instance().verbose_) {
@@ -1326,2 +1329,3 @@
     {
+        std::cout << "modify_setmeta\n";
         if (Params::instance().verbose_) {
@@ -1398,2 +1402,3 @@
     {
+        std::cout << "modify_delmeta\n";
         if (Params::instance().verbose_) {
@@ -1430,2 +1435,3 @@
     {
+        std::cout << "modify_regname\n";
         if (Params::instance().verbose_) {
@@ -1453,2 +1459,3 @@
     try {
+        std::cout << "adjust_run\n";
         adjustment_      = Params::instance().adjustment_;
thoralf@tfbb:~/Baustelle/exiv2/trunk$
thoralf@tfbb:~/Baustelle/exiv2/trunk$ diff -u2 -b src/basicio.cpp{.orig,} 
--- src/basicio.cpp.orig        2015-03-27 10:47:20.112838974 +0100
+++ src/basicio.cpp     2015-03-27 16:58:35.687297553 +0100
@@ -182,4 +182,5 @@
     int FileIo::Impl::switchMode(OpMode opMode)
     {
+        // std::cout << "FileIo::Impl::switchMode here\n";
         assert(fp_ != 0);
         if (opMode_ == opMode) return 0;
@@ -369,4 +370,5 @@
     int FileIo::munmap()
     {
+        std::cout << "fileio::munmap here\n";
         int rc = 0;
         if (p_->pMappedArea_ != 0) {
@@ -403,4 +405,5 @@
     byte* FileIo::mmap(bool isWriteable)
     {
+        std::cout << "fileio::mmap here\n";
         assert(p_->fp_ != 0);
         if (munmap() != 0) {
@@ -572,4 +575,5 @@
         BasicIo::AutoPtr basicIo;

+        std::cout << "hi there, basicio here\n";
         Impl::StructStat buf;
         int ret = p_->stat(buf);
@@ -587,4 +591,5 @@
         // ret = 911;
         if (ret != 0 || (buf.st_size > 1048576 && nlink == 1)) {
+            std::cout << "doing file-io\n";
             pid_t pid = ::getpid();
             std::auto_ptr<FileIo> fileIo;
@@ -615,4 +620,5 @@
         }
         else {
+            std::cout << "doing mem-io\n";
             basicIo.reset(new MemIo);
         }
@@ -653,4 +659,5 @@
     void FileIo::transfer(BasicIo& src)
     {
+        std::cout << "FileIo::transfer here\n";
         const bool wasOpen = (p_->fp_ != 0);
         const std::string lastMode(p_->openMode_);
thoralf@tfbb:~/Baustelle/exiv2/trunk$ 

… which yields the following:
KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept-small.jpg 
File 1/1: win2k-share-with-cache/keksrezept-small.jpg
modify_run
fileio::munmap here
fileio::munmap here
fileio::munmap here
fileio::munmap here
fileio::munmap here
modify_apply
modify_setmeta
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap here
hi there, basicio here
doing mem-io
Info: Write strategy: Intrusive
fileio::munmap here
FileIo::transfer here
fileio::munmap here
fileio::munmap here
fileio::munmap here
KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg 
File 1/1: win2k-share-with-cache/keksrezept.jpg
modify_run
fileio::munmap here
fileio::munmap here
fileio::munmap here
fileio::munmap here
fileio::munmap here
modify_apply
modify_setmeta
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap here
hi there, basicio here
doing mem-io
Info: Write strategy: Intrusive
fileio::munmap here
FileIo::transfer here
fileio::munmap here
[ … hang occurs right here ]
^C
KMT\adlibworker@stone:~$ 
a few observations:
  • the hanging process can be terminated by pressing ^C now
  • fileio::mmap doesn't seem to be called at all
  • commenting in the std::cout << "FileIo::Impl::switchMode here\n";-statement makes the code work for both my testfiles (as does strace'ing the original code, which might be related)

with kind regards,
t.

#46

Updated by thoralf schulze over 6 years ago

hi,

s/3637/3627 - time for beer, and bon appetit to the curry connoisseurs as well :-)

t.

#47

Updated by Robin Mills over 6 years ago

  • Status changed from Resolved to Assigned

Your work-around with 'cache=none' might have to be the final resolution here.

You'd reported it failing on r3633. basicio.cpp was changed on r3635. So I though "oh, he doesn't have the latest code".

$ svn log src/basicio.cpp
------------------------------------------------------------------------
r3635 | robinwmills | 2015-03-24 22:25:34 +0000 (Tue, 24 Mar 2015) | 3 lines
#1043 #1042 #812.  Thank You to Thomas for this "polishing" patch.  Bla Bla
------------------------------------------------------------------------
r3630 | robinwmills | 2015-03-24 00:27:59 +0000 (Tue, 24 Mar 2015) | 1 line
#1043 and #1042.  Thanks to Thomas for showing that r3627 reintroduced #812.
Thanks to Thoralf for suggesting msync MemIo fix.
------------------------------------------------------------------------
r3627 | robinwmills | 2015-03-21 16:35:06 +0000 (Sat, 21 Mar 2015) | 1 line
#1042 and #1043.  Don't use a MemIo object for small temporary files.

I can test Linux <-> W7 immediately. I have a legal copy of W2008 Server on a DVD which I can load on a VM if necessary.

Banging printf's into the code is called "Instrumentation" in the land of C++ (a posh title for hacking really).

We have friends coming from the States on Tuesday for Easter. If I don't deal with this by Tuesday, it'll have to wait for a while.

Enjoy the beer and we'll enjoy the curry.

#48

Updated by Thomas Beutlich over 6 years ago

Very strange that FileIo::mmap() is never called.

Can you please instrument further in FileIo::munmap(), especially before and after ::munmap()? I guess that it does not return in case it hangs.

See also my comment:28 again: Important question is also if ::msync() must be called before FileIO::munmap() is called? Can you add it and test once more?

#49

Updated by Robin Mills over 6 years ago

  • Status changed from Assigned to Resolved

I've repeated the tests that I performed in #43 on Linux <-> Windows7. Same perfect result.

I use smb4k to mount the drives and I am not sure what options it uses. Although I can mount the drive using the command line, I get "permission denied" when I attempt to write on the drive.

Gentlemen: I have to move on. This matter has consumed a great deal of time during the last 2 weeks. Thoralf has a work-around by using the mount option cache=none. If anybody wishes to work on this, I will accept a patch IF it passes our test suite on all platforms (Mac/ Linux/ Windows/ Cygwin/ MinGW). The test suite has been extended to provide a regression detector for both #812 and #1043.

Working on Exiv2 is a full time unpaid job. You are both very interested in this CIFS/MemIo topic, however this is not the only matter competing for my time. We are evaluating proposals for Google Summer of Code 2015. Daniel wants our CMake/MSVC support improved. Mikayel got 8 hours of my time on Wednesday. There are outstanding matters to be dealt with for v0.25 and Jenkins. George wants a bug investigated where occasionally (but not consistently) crashes while processing every file in a directory of 40000 images on Windows. And there will be some other matters that I have forgotten at the moment. Oh, and I have a life as well.

#50

Updated by thoralf schulze over 6 years ago

hi,

i consider 1042 and 1043 to be non-trivial bugs in exiv2 that should be squashed now, and i am willing to investigate further. if time permits :-/

instrumenting the hell out of fileinfo::munmap() (see the attached patch [against the current trunk]), i can conclude that ::munmap never gets called either:

KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg 
File 1/1: win2k-share-with-cache/keksrezept.jpg
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
Info: Write strategy: Intrusive
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0

does if (p_->pMappedArea_ != 0) (line 372) check for a null pointer?

for reference, the same test with the (working) small file:

KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept-sma
ll.jpg
File 1/1: win2k-share-with-cache/keksrezept-small.jpg
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
Info: Write strategy: Intrusive
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
fileio::munmap
 … common block
   … iswriteable: 0 
 … return value: 0  
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0
fileio::munmap
 … common block
   … iswriteable: 0
 … return value: 0  
KMT\adlibworker@stone:~$

i guess that p_->isWriteable_ should evaluate to true at some point, shouldn't it?

thank you & with kind regards,
t.

#51

Updated by thoralf schulze over 6 years ago

hi thomas,

Thomas Beutlich wrote:

See also my comment:28 again: Important question is also if ::msync() must be called before FileIO::munmap() is called? Can you add it and test once more?

where exactly do you want me to put it?

with kind regards,
t.

#52

Updated by Robin Mills over 6 years ago

Gentlemen

I am now on vacation. We have friends here from California. I will be checking email while on the road, however my time for working is limited. They leave on April 10.

I have requested write access to the repository for Thomas and I wish you well in getting to the bottom of this matter. To avoid debugging this on the trunk, I will set up a new branch "memio-investigation" and monitor in with Jenkins. http://exiv2.dyndns.org:8080/

Robin

#53

Updated by Thomas Beutlich over 6 years ago

I just started to debug (on Win though).

Yes, I also disgagree on moving on too early. Behaviour shall be identical for small and large files!

#54

Updated by Robin Mills over 6 years ago

Thank you both Thoralf and Thomas for getting involved with this. I'm exhausted and need a break.

#55

Updated by Robin Mills over 6 years ago

  • Status changed from Resolved to Assigned
  • Assignee changed from Robin Mills to Thomas Beutlich
#56

Updated by Thomas Beutlich over 6 years ago

OK, FileIo::munmap() actually does nothing since for jpg files there never is FileIo::mmap() called. Thus we do not need to further concentrate on this. Telling from Thoralf logs I propose to further concentrate on function FileIo::transfer().

Can you please,
  • in FileIo::munmap() change the first line to

std::cout << "fileio::munmap " << p_ << " " << p_->path_ << "\n";
  • FileIo::close() replace by

@
int FileIo::close() {
int rc = 0;
if (munmap() != 0) rc = 2;
if (p_->fp_ != 0) {
std::cout << "before close\n";
if (std::fclose(p_->fp_) != 0) rc |= 1;
std::cout << "after close\n";
p_->fp_= 0;
}
return rc;
}
@

  • and FileIo::open() replace by
    @
    int FileIo::open(const std::string& mode) {
    close();
    p_->openMode_ = mode;
    p_->opMode_ = Impl::opSeek;
    #ifdef EXV_UNICODE_PATH
    if (p_->wpMode_ == Impl::wpUnicode) {
    std::cout << "before open\n";
    p_->fp_ = ::_wfopen(wpath().c_str(), s2ws(mode).c_str());
    std::cout << "after open\n";
    }
    else
    #endif {
    std::cout << "before open\n";
    p_->fp_ = ::fopen(path().c_str(), mode.c_str());
    std::cout << "after open\n";
    }
    if (!p_->fp_) return 1;
    return 0;
    }
    @
#57

Updated by thoralf schulze over 6 years ago

hi,

robin: thank you for your cooperation :-)
thomas … thank you for having a look. small file:

KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept-small.jpg 
File 1/1: win2k-share-with-cache/keksrezept-small.jpg
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before open
after open
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before close
after close
before open
after open
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before close
after close
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before open
after open
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before close
after close
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before open
after open
Info: Write strategy: Intrusive
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before close
after close
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before open
after open
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
before close
after close
fileio::munmap 0x14d3a50 win2k-share-with-cache/keksrezept-small.jpg
KMT\adlibworker@stone:~$ 

big file:
KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg 
File 1/1: win2k-share-with-cache/keksrezept.jpg
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
before open
after open
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
Info: Write strategy: Intrusive
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
fileio::munmap 0x2004a50 win2k-share-with-cache/keksrezept.jpg
KMT\adlibworker@stone:~$ 

… no hang on this call. next try:
KMT\adlibworker@stone:~$ /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg 
File 1/1: win2k-share-with-cache/keksrezept.jpg
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
before open
after open
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
Info: Write strategy: Non-intrusive
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before close
after close
fileio::munmap 0x1705a50 win2k-share-with-cache/keksrezept.jpg
before open
after open

[ … hang ]

thank you & with kind regards,
t.

btw: please send me an openvpn certificate request if you need access to the machine …

#58

Updated by Thomas Beutlich over 6 years ago

Can you try ro reproduce the hang with intrusive write strategy. This is different in the last two logs of the big file. Maybe you need to set other value than 66666.

On Win it looks completely different where a temp file is opened.

Can you please debug where it hangs with gdb?
  • Compile exiv2 with -g
  • In shell type: gdb exiv2
  • Then type in gdb shell: run -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg
  • When it hangs type: where
#59

Updated by thoralf schulze over 6 years ago

hi thomas,

Thomas Beutlich wrote:

Can you try ro reproduce the hang with intrusive write strategy. This is different in the last two logs of the big file. Maybe you need to set other value than 66666.

seems to be hit and miss … sometimes exiv2 does not hang when intrusive is being used, most of the time it does - the successful execution in #57 seems to be a rare exception. when it succeeds to write the big file, it resorts to non-intrusive on the next run, which then also hangs most of the time.

Can you please debug where it hangs with gdb?
  • Compile exiv2 with -g
  • In shell type: gdb exiv2
  • Then type in gdb shell: run -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg
  • When it hangs type: where

i am not able to ^c back to gdb once exiv2 hangs:

KMT\adlibworker@stone:~$ gdb /usr/local/bin/exiv2
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
[ bla … ]
Reading symbols from /usr/local/bin/exiv2...done.
(gdb) run -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg
Starting program: /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg
File 1/1: win2k-share-with-cache/keksrezept.jpg
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
before open
[ … like above]
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
before open
after open
[ … hang occurs here ]
^C
where
^C
where
^C^C^C^C

where
bt

… i'll try to play around with breakpoints.

with kind regards,
t.

#60

Updated by Thomas Beutlich over 6 years ago

Thanks for testing. FileIO::transfer() should be a good candicate to play with.

#61

Updated by thoralf schulze over 6 years ago

hi thomas,

the hang occurs in FileIo::transfer's write(src)-line - this is #896 in the non-instrumented source.

KMT\adlibworker@stone:~$ gdb /usr/local/bin/exiv2 
[…]
(gdb) # this is the beginning of void FileIo::transfer(BasicIo& src)
(gdb) break basicio.cpp:690
(gdb) run -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg
Starting program: /usr/local/bin/exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" win2k-share-with-cache/keksrezept.jpg
File 1/1: win2k-share-with-cache/keksrezept.jpg
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
before open
after open
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
[…]
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
before open
after open
Info: Write strategy: Intrusive
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
before close
after close
FileIo::transfer here

Breakpoint 2, Exiv2::FileIo::transfer (this=0x62aa30, src=...) at basicio.cpp:691
warning: Source file is more recent than executable.
691             const bool wasOpen = (p_->fp_ != 0);
(gdb) n
692             const std::string lastMode(p_->openMode_);
(gdb) n
691             const bool wasOpen = (p_->fp_ != 0);
(gdb) n
692             const std::string lastMode(p_->openMode_);
(gdb) n
691             const bool wasOpen = (p_->fp_ != 0);
(gdb) n
692             const std::string lastMode(p_->openMode_);
(gdb) n
694             FileIo *fileIo = dynamic_cast<FileIo*>(&src);
(gdb) n
695             if (fileIo) {
(gdb) n
694             FileIo *fileIo = dynamic_cast<FileIo*>(&src);
(gdb) n
695             if (fileIo) {
(gdb) n
913                 if (open("w+b") != 0) {
(gdb) n
fileio::munmap 0x62aa50 win2k-share-with-cache/keksrezept.jpg
before open
after open
924                 if (src.open() != 0) {
(gdb) n
935                 write(src);
(gdb) n
[ … hang]

with kind regards,
t.

#62

Updated by thoralf schulze over 6 years ago

hi,

a bit more detail:

677                 writeTotal += writeCount = (long)std::fwrite(buf, 1, readCount, p_->fp_);
(gdb) n
678                 if (writeCount != readCount) {
(gdb) n
676             while ((readCount = src.read(buf, sizeof(buf)))) {
(gdb) n
677                 writeTotal += writeCount = (long)std::fwrite(buf, 1, readCount, p_->fp_);
(gdb) n
678                 if (writeCount != readCount) {
(gdb) n
676             while ((readCount = src.read(buf, sizeof(buf)))) {
(gdb) n
677                 writeTotal += writeCount = (long)std::fwrite(buf, 1, readCount, p_->fp_);
(gdb) n
n
[… hang]

… this is long FileIo::write(BasicIo& src)

with kind regards,
t.

#63

Updated by thoralf schulze over 6 years ago

hi,

amending the write-loop in long FileIo::write(BasicIo& src) like this:

        while ((readCount = src.read(buf, sizeof(buf)))) {
            std::cout << "write loop … rc: " << readCount << " wc: " << writeCount << " wt: " << writeTotal << "\n";
            writeTotal += writeCount = (long)std::fwrite(buf, 1, readCount, p_->fp_);
            if (writeCount != readCount) {
                // try to reset back to where write stopped
                src.seek(writeCount-readCount, BasicIo::cur);
                break;
            }
        }

yields this for the large file:
FileIo::transfer here
fileio::munmap 0xaf0a50 win2k-share-with-cache/keksrezept.jpg
before open
after open
in fileio::write …
write loop … rc: 4096 wc: 0 wt: 0
write loop … rc: 4096 wc: 4096 wt: 4096
write loop … rc: 4096 wc: 4096 wt: 8192
write loop … rc: 4096 wc: 4096 wt: 12288
write loop … rc: 4096 wc: 4096 wt: 16384
[ … hang]

and this:
FileIo::transfer here
fileio::munmap 0x1ecea50 win2k-share-with-cache/keksrezept-small.jpg
before open
after open
in fileio::write …
write loop … rc: 4096 wc: 0 wt: 0
write loop … rc: 4096 wc: 4096 wt: 4096
write loop … rc: 4096 wc: 4096 wt: 8192
write loop … rc: 4096 wc: 4096 wt: 12288
write loop … rc: 4096 wc: 4096 wt: 16384
write loop … rc: 4096 wc: 4096 wt: 20480
write loop … rc: 4096 wc: 4096 wt: 24576
write loop … rc: 1648 wc: 4096 wt: 28672
fileio::munmap 0x1ecea50 win2k-share-with-cache/keksrezept-small.jpg
before close
after close
fileio::munmap 0x1ecea50 win2k-share-with-cache/keksrezept-small.jpg
KMT\adlibworker@stone:~$

for the small file. i'm stumped.

with kind regards,
t.

#64

Updated by Robin Mills over 6 years ago

Guys

Can I encourage you to look at samples/iotest.cpp. It's effectively a file copy program which uses both FileIO and MemIO in basicio.cpp. It's used by our test suite - however it's worth playing with this a little on large and small files. It's easy to understand it.

Thanks for doing this work.

Robin

#65

Updated by Thomas Beutlich over 6 years ago

To give an expression how it looks on Win:

C:\Projekte\wdx_jpg_comment_src\exiv2\msvc\exiv2\Debug>exiv2 -v -Q d -M"set Exif.Photo.PixelYDimension 66666" keksrezept.jpg
File 1/1: keksrezept.jpg
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before open
after open
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before close
after close
before open
after open
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before close
after close
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before open
after open
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before close
after close
Set Exif.Photo.PixelYDimension "66666" (Long)
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before open
after open
fileio::munmap 007BB9B8 keksrezept.jpg2056
 . common block
   . iswriteable: 0
 . return value: 0
before open
after open
Info: Write strategy: Non-intrusive
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before close
after close
fileio::munmap 007BB9B8 keksrezept.jpg2056
 . common block
   . iswriteable: 0
 . return value: 0
before close
after close
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before open
after open
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
before close
after close
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0
fileio::munmap 007BB9B8 keksrezept.jpg2056
 . common block
   . iswriteable: 0
 . return value: 0
fileio::munmap 003A7370 keksrezept.jpg
 . common block
   . iswriteable: 0
 . return value: 0

You see that there is a temporary file with trailing PID created by JpegBase::writeMetadata() -> FileIo::temporary(). Why is there no such file in the Linux case?

#66

Updated by Thomas Beutlich over 6 years ago

Sorry, forget about my last comment. We are doing MemIo here which is case for large files having hard links.

#67

Updated by Thomas Beutlich over 6 years ago

Some (alternative) ideas for long FileIo::write(BasicIo& src) to test.
  • Increase buffer size to some greater value. For test you could also try to set the exact file size (or some greater value) as buffer size such that there is only one call to std::fwrite().
  • Call setvbuf(p_->fp_, NULL, _IONBF, 0); anywhere before the while loop.
  • Call setvbuf(p_->fp_, (char*)buf, _IOFBF, 4096); anywhere before the while loop.
  • Call setvbuf(p_->fp_, NULL, _IOFBF, 4096); anywhere before the while loop.
  • Call fflush(p_->fp_); after each call to std::fwrite().
  • Swap second and third element of std::fwrite()@to @std::fwrite(buf, readCount, 1, p_->fp_). Should not matter.
#68

Updated by thoralf schulze over 6 years ago

hi there,

just a quick note: i'll be on vacation next week and won't show up here. i will definitely have a look at iotest.cpp and thomas' suggestions afterwards …

thank you & with kind regards,
t.

#69

Updated by Robin Mills over 6 years ago

Have a nice break, Thoralf. We'll talk when you (and I) get back to work.

Robin

#70

Updated by Calvin Browne over 6 years ago

Hi Guys,

got back from my break.

svn checkout svn://dev.exiv2.org/svn/trunk
make config
cd trunk/
make config
./configure

and yay!

for i in `ls`; do ~/exiv2-trunk/trunk/bin/exiv2 -v -M"set Exif.Photo.UserComment charset=Ascii New Exif comment2" $i; done

runs through, several times with no truncations. (Ubuntu 14.04.2 LTS client & server)

did a 'make install' and loaded up shotwell and it no hangs on my 50k library (although to be fair only a portion of them were due to have comments rewritten).

So, from my perspective all looks good.

Any pointers on getting this fix into the Ubuntu project? (I'm not fortunate enough to be involved in these things regularly so know the procedures so pointers would help).

Many thanks!

--Calvin

#71

Updated by Robin Mills over 6 years ago

Calvin

How nice to hear from you. And to hear your good news.

Thoralf and Thomas are continuing to work on this matter. There's no smoke without fire and Thoralf can see smoke on Windows Server 2008. So T & T are going to continue to work on this and that enables me to pursue other matters.

Hope you've enjoyed your trip.

Robin

#72

Updated by Thomas Beutlich over 6 years ago

thoralf schulze: Do you have any news on my above suggestions? Obviously you are the only one left who can reproduce the problem.

#73

Updated by thoralf schulze over 6 years ago

hi thomas, hi all,

… sorry for taking so long, i am drowning in work. i was just about to try thomas' suggestions and cannot reproduce the issue anymore. long story short: about a week ago, we've had to reboot the machine that exhibited the symptoms - it all works fine now, sorry for making demands on your time.

a bit more nitty-gritty: the machine needed a reboot due to a bunch of more or less critical updates accumulating. among them was a kernel update¹ - skimming through the changelogs, i found these notes:
- cifs: fix deadlock in cifs_ioctl_clone()
- cifs: Complete oplock break jobs before closing file handle (regression in 3.15)
… both issues were fixed in 3.16.7-ckt7-1. i suspect that the latter patch fixes our issue, please see http://permalink.gmane.org/gmane.linux.kernel.stable/123919 for details.

i've let my testcase run for a while now, with both my butchered version of exiv2 and the one from the debian repositories … they did run for 1000+ iterations now, without any hiccups.

i guess we are all set now - please feel free to close this issue. "beers in berlin for anyone involved" is still (and will always be) a standing offer.

thank you very much for your time & with kind regards,
t.

¹ - linux-image-3.16.0-0.bpo.4-amd64:amd64 3.16.7-ckt4-3~bpo70+1 to 3.16.7-ckt7-1~bpo70+1

#74

Updated by Robin Mills over 6 years ago

  • Status changed from Assigned to Resolved

Well this is a very happy ending and a fine state of affairs. Well, apart from your drowning in work. Good team-work all round here. And some beer in Berlin's a great offer - you'll find I can be very thirsty! If you're in England, let me know. We're 20 miles from Heathrow in the beautiful county of Surrey.

#75

Updated by Thomas Beutlich over 6 years ago

Good news. Thanks.

Will check again which commits were done here and if they were necessary.

@Robin: i think you can delete the svn branch you offered for testing.

#76

Updated by Robin Mills over 6 years ago

Good thinking, Thomas. Can you review changes and tidy up if necessary. I've removed the memo-investigation branch from SVN and removed the branch monitor from Jenkins.

#77

Updated by Thomas Beutlich over 6 years ago

I revisited the commits bound to this issue, esp. r3630. I do no think that msync is necessary for MemIo. According to http://man7.org/linux/man-pages/man2/msync.2.html msync is only useful for memory mapped files which is not the case in MemIo. I would rather see it in FileIo where there are the memory mapped files. Thus my questions in #1043-28 (28) and #1043-29 (29) are still not answered.

#78

Updated by Thomas Beutlich over 6 years ago

Robin, should I open a new issue for this open questions?

#79

Updated by Robin Mills over 6 years ago

Thomas:

I think #1043 (and his buddy #1042) are resolved. I'm happy if you log a new issue about msync().

As we're very close to v0.25, I'd like to delay any work on this new issue. If you believe this is an essential fix for v0.25, could you associate the new issue with #1071 which is the 'Umbrella' for v0.25.

If this new issue is not essential for v0.25, please target it at v0.26.

#80

Updated by Andreas Huggel over 6 years ago

  • Status changed from Resolved to Closed
#81

Updated by Robin Mills over 5 years ago

  • % Done changed from 0 to 100

Also available in: Atom PDF