mocp crashed when quitting and having open a folder on errorneous NTFS partition

Forums:

mocp crashed when quitting and having opened a directory, which partition was at that time in errorneous state. I don't konw if this can be reproduced and I don't know what exactly caused this, so I'll try to just describe what happened.

I left moc daemon and mocp (in a console window) running over weekend having opened a media folder on a NTFS partition on a USB-connected drive. Earlier today I just closed the console player and did not care about the daemon.

Now I was trying to save something on the partition and found out that there was something wrong: ls -l on a symlink leading to folder on the partition wared about "I/O Error" and shown one of sub-folders with "?????" instead of some details. So I went to re-mount the drive. Obviously moc demon was preventing that so I opened mocp again and hit S-Q.

That's when mocp crashed.

This happened on Fedora 20. The crash was caught by ABRT (Automatic Bug Reporting Tool) and stack trace was generated. So what I have is the "problem data", a tarball of a /var/tmp/abrt/ccpp* subdir, where abrt keeps information related to crash

Here is link to the tarball:

https://www.dropbox.com/s/5pjjxqgi2wpkfsz/mocp-crash.tar.gz

Please let me know when I can delete it.

When I first read your posting, I thought: "There's not much hope of finding that!" However, the circumstances sounded similar to one of the bugs I first identified in testing at the end of last week.

Your ABRT tarball is too large for me to download here, but I have now downloaded it to a remote machine and removed the 'coredump' file which reduced it to a size I was able to copy across.

The bugs I identified last week are triggered when MOC encounters a symlink which was broken while it was running. There are three:

  • MOC ends up with an unidentified file type when it tries to play the broken symlink, and this results in a read of uninitialised storage.
  • MOC tries to close the offending file twice, and this results also results in the double freeing of mutexes.
  • When MOC closes down, there is some storage management problem detected during the closing of the tags cache database.

I have a tested patch for the first bug, and an untested patch for the second bug. I am still investigating the third bug and haven't had time to further that in the last couple of days, but today I do.

The 'core_backtrace' file of your ABRT tarball shows the exact same sequence as I was experiencing in that third bug, so I'm almost positive it is the same one.

(It always amazes me that such bugs can lie undetected for so long then suddenly appear in multiple manifestations.)

Nice work, jcf :)

Reminds me of Schroedingbug :D

schroedinbug, n.: [MIT: from the Schroedinger's Cat thought-experiment in quantum physics] A design or implementation bug in a program that doesn't manifest until someone reading source or using the program in an unusual way notices that it never should have worked, at which point the program promptly stops working for everybody until fixed.

Thanks for the timely reminder of Schroedingbugs.

In that "testing at the end of last week" I was able to trigger the third bug consistantly. On the "today I do" day I wasn't able to trigger it at all. Subsequently, I have been able to trigger it again, but it may be coming from a different cause.

In my case, it may be caused by a memory corruption coming from the FFmpeg libraries when they encounter one of the MP3 audio files with corrupt ID3 tag data in my bug farm. If so, that doesn't explain why I haven't seen it before, why it first appeared in conjunction with broken symlinks but now also appears in isolation, and why you also see it when you don't (presumably) have MP3 files with corrupt ID3 tags.

A Schroedingbug indeed!

Investigating such bugs is made especially difficult because the cause of memory corruption bug is usually well away from place where it becomes visible. In this case, it's doubly difficult because it also causes some debugging tools to crash. One of my two rules of thumb for handling such elusive bugs is that it may, in fact, be two bugs which are interacting, and that is a definite possibility here.

Since that "today I do" day I've been developing a patch for another issue which needed to be solved before MOC 2.5.0. That's now complete and today was intended to be another "today I do" day, but the time is quickly being whittled away by non-MOC demands.

I have today committed r2621 and r2622 which respectively resolve bugs 1 and 2 above.

The third bug is a storage initialisation problem within the Berkeley DB library version 4.4 and is triggered by a database page split which only occurs occasionally (and that accounts for it being a "Schroedingbug"). It is a problem resolved by version 4.8, and given "alois.mahdal" was using version 5.3 this looks like a red herring.

It is possible that r2622 also fixes the third bug. "alois.mahdal" has not responded to my e-mail request for more information so there's little more I can do.

Despite many sessions over the past few months trying to reproduce this problem, MOC has performed flawlessly (modulo red herrings) in the face of everything I could throw at it by way of moved, broken or deleted files and symlinks. The one exception was where I placed the tags cache on a USB device which I pulled before closing down the MOC server. That test resulted in a kernel crash.

Therefore, I'm closing this problem. Of course, if other occurrences which provide additional information are reported then they'll be investigated.