[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [msmtp-users] Bug in msmtpq might cause mail loss



My first message was screwed up, probably because of the .sh file 
extensions of the attachments. Sorry, I have to give it a second try.

Hi Martin,

thanks for considering the first patch.

I want to add that there is also an ugly race condition in the queue script 
coming from the way the queue is locked. I run into this error reproducibly 
when bouncing several messages simultaneously in mutt, so it is not 
esoteric. It results in errors such as

2011 26 Oct 21:07:27 : preparing to send .mail file [ /home/ph/.msmtpqueue/2011-10-26-21.07.27.mail ] but
2011 26 Oct 21:07:27 :   corresponding .msmtp file [ /home/ph/.msmtpqueue/2011-10-26-21.07.27.msmtp ] was not found in queue
2011 26 Oct 21:07:27 :   skipping this mail ; this is worth looking into
 
I've attached two scripts which illustrate the problem, modeled on [1]. Go to an 
*EMPTY* directory, copy the scripts there and run
$ echo 0 > testcounterfile; for (( i=1; i <= 10; i++ )); do ./testscript2.sh&; done
This launches 10 instances of testscript2.sh each of which has the duty to 
increment the number in testcounterfile by 1. The fact that testcounterfile 
reproducibly contains numbers strictly smaller than 10 after all instances 
of testscript2.sh have terminated shows that the locking doesn't work 
reliably.

The issue is that another instance of the script can squeeze between
    [ -e "$LOK" ]
and
    touch "$LOK"

One solution is to use mkdir and check the exit status because mkdir is 
atomic. The testscript3.sh illustrates this and with this I reliably get 
the value 10 in testcounterfile when executing
$ echo 0 > testcounterfile; for (( i=1; i <= 10; i++ )); do ./testscript3.sh&; done

I've attached a patch which implements the behavior of testscript3.sh 
msmtpq. Any thoughts?

Regards,
Philipp

[1] http://stackoverflow.com/questions/325628/race-condition-in-the-common-lock-on-file/1571711#1571711
#!/bin/bash
trap on_exit EXIT                    # run 'on_exit' on exit
on_exit() {
  [ -n "$LKD" ] && lock_queue -u 2>/dev/null   # unlock the queue on exit
}                                              # if the lock was set here
lock_queue() {        # <-- '-u' to remove lockfile
  local LOK="testlockfile"             # lock file name
  local -i MAX=240 SEC=0             # max seconds to gain a lock ; seconds waiting

  if [ -z "$1" ] ; then              # lock queue
    while [ -e "$LOK" ] && (( SEC < MAX )) ; do      # lock file present
	    sleep 1                                        # wait a second
	    (( ++SEC ))                                    # accumulate seconds
    done                                             # try again while locked for MAX secs
    [ -e "$LOK" ] && exit 1
    touch "$LOK" && \
      LKD='t' || \
      exit 1

  elif [ "$1" = '-u' ] ; then        # unlock queue
    'rm' -f "$LOK"                   # remove the lock
  fi
}

lock_queue
NO=$(cat testcounterfile)
echo "$NO"
(( ++NO ))
echo "$NO" > testcounterfile
#!/bin/bash
trap on_exit EXIT                    # run 'on_exit' on exit
on_exit() {
  [ -n "$LKD" ] && lock_queue -u 2>/dev/null   # unlock the queue on exit
}                                              # if the lock was set here

lock_queue() {        # <-- '-u' to remove lockfile
  local LOK="testlockfile"             # lock file name
  local -i MAX=240 SEC=0             # max seconds to gain a lock ; seconds waiting

  if [ -z "$1" ] ; then              # lock queue
	mkdir $LOK 2>/dev/null &&  LKD='t'
    while [ -z "$LKD" ] && (( SEC < MAX )) ; do      # lock file present
	    sleep 1                                        # wait a second
	    (( ++SEC ))                                    # accumulate seconds
		mkdir $LOK 2>/dev/null &&  LKD='t'
    done                                             # try again while locked for MAX secs
    [ -z "$LKD" ] && exit 1

  elif [ "$1" = '-u' ] ; then        # unlock queue
    'rmdir' "$LOK"                   # remove the lock
  fi
}

lock_queue
NO=$(cat testcounterfile)
echo "$NO"
(( ++NO ))
echo "$NO" > testcounterfile
--- msmtpq	2011-10-19 15:03:01.523338414 +0200
+++ msmtpq-my	2011-10-27 18:53:53.826788271 +0200
@@ -133,22 +133,20 @@
   local -i MAX=240 SEC=0             # max seconds to gain a lock ; seconds waiting
 
   if [ -z "$1" ] ; then              # lock queue
-    while [ -e "$LOK" ] && (( SEC < MAX )) ; do      # lock file present
+    mkdir $LOK 2>/dev/null &&  LKD='t'
+    while [ -z "$LKD" ] && (( SEC < MAX )) ; do      # lock file present
 	    sleep 1                                        # wait a second
 	    (( ++SEC ))                                    # accumulate seconds
+	    mkdir $LOK 2>/dev/null &&  LKD='t'
     done                                             # try again while locked for MAX secs
-    [ -e "$LOK" ] && \
+    [ -z "$LKD" ] && \
 	    err '' "cannot use queue $Q : waited $MAX seconds for"\
 	           "  lockfile [ $LOK ] to vanish ; giving up"\
 	           'if you are certain that no other instance of this script'\
 	           '  is running, then delete the lock file manually' '' # lock file still there, give up
 
-    touch "$LOK" && \
-      LKD='t' || \
-      log -e "$?" "couldn't create queue lock file [ $LOK ]"       # lock queue
-
   elif [ "$1" = '-u' ] ; then        # unlock queue
-    'rm' -f "$LOK"                   # remove the lock
+    'rmdir' "$LOK"                   # remove the lock
   fi
 }