Client Freeze / Corrupt Tags Cache / 'rm -rf ~/.moc/cache'

User tomaszg and I have put some time into investigating the supposed tags cache corruption problem for which the accepted wisdom has been to delete the MOC tags cache. It has actually turned out to be a problem with stale database locks left over after a previous server crash, and I have today committed a patch (r2492) which I believe fixes it.

The Symptoms

  1. The client freezes.
  2. The server continues playing but does not respond to commands.
  3. Neither the server nor the client can be shut down gracefully.

Verification

  1. The client's status message indicates a server communication is in progress, usually relating to tags or playlist.
  2. The command db_stat -h ~/.moc/cache shows the same WRITE lock being HELD and WAITed for on the same audio file.
  3. This does not change significantly after the server is terminated.
  4. When the server is terminated the client also terminates with a fatal error about being unable to receive a value from the server.
  5. Setting the 'TagsCacheSize' option very low significantly increases the frequency of the problem appearing.

The Cause

  1. At some point the server has died while holding a database lock.
  2. Eventually the server attempts to acquire the lock and waits forever.
  3. Because it's blocked on the lock it cannot service new requests.
  4. The client sends a request and waits forever for the response.

Recovery Action

  1. kill -9 $(cat ~/.moc/pid)
  2. db_recover -h ~/.moc/cache
  3. rm -f ~/.moc/cache/log.*
  4. mocp

(Note that step 2 will create a 10MB log file. Step 3 deletes it, but you will still need to have sufficient space in your filesystem.)

The Solution

MOC no longer uses files (~/.moc/cache/__db.*) to back the memory in which locks are held. So when the MOC server crashes, any locks it is holding disappear along with the process's memory. This works because MOC uses single-process (though multi-threaded) access to the database holding the tags cache and it appears that losing the file-backed memory does not endanger the database's structural integrity.

Also See

node/825 (this is a textbook example of this problem)
node/599 (here the segfault is the triggering event)

One Year On...

It seems my belief in the patch (r2492) has been validated. In the year since it was committed there has not been a single reported occurrence of the "Corrupt Tags Cache" problem, so I think we can say it's now well and truely behind us.