[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [msmtp-users] Bug in msmtpq might cause mail loss
My first message was screwed up, probably because of the .sh file
extensions of the attachments. Sorry, I have to give it a second try.
Hi Martin,
thanks for considering the first patch.
I want to add that there is also an ugly race condition in the queue script
coming from the way the queue is locked. I run into this error reproducibly
when bouncing several messages simultaneously in mutt, so it is not
esoteric. It results in errors such as
2011 26 Oct 21:07:27 : preparing to send .mail file [ /home/ph/.msmtpqueue/2011-10-26-21.07.27.mail ] but
2011 26 Oct 21:07:27 : corresponding .msmtp file [ /home/ph/.msmtpqueue/2011-10-26-21.07.27.msmtp ] was not found in queue
2011 26 Oct 21:07:27 : skipping this mail ; this is worth looking into
I've attached two scripts which illustrate the problem, modeled on [1]. Go to an
*EMPTY* directory, copy the scripts there and run
$ echo 0 > testcounterfile; for (( i=1; i <= 10; i++ )); do ./testscript2.sh&; done
This launches 10 instances of testscript2.sh each of which has the duty to
increment the number in testcounterfile by 1. The fact that testcounterfile
reproducibly contains numbers strictly smaller than 10 after all instances
of testscript2.sh have terminated shows that the locking doesn't work
reliably.
The issue is that another instance of the script can squeeze between
[ -e "$LOK" ]
and
touch "$LOK"
One solution is to use mkdir and check the exit status because mkdir is
atomic. The testscript3.sh illustrates this and with this I reliably get
the value 10 in testcounterfile when executing
$ echo 0 > testcounterfile; for (( i=1; i <= 10; i++ )); do ./testscript3.sh&; done
I've attached a patch which implements the behavior of testscript3.sh
msmtpq. Any thoughts?
Regards,
Philipp
[1] http://stackoverflow.com/questions/325628/race-condition-in-the-common-lock-on-file/1571711#1571711
#!/bin/bash
trap on_exit EXIT # run 'on_exit' on exit
on_exit() {
[ -n "$LKD" ] && lock_queue -u 2>/dev/null # unlock the queue on exit
} # if the lock was set here
lock_queue() { # <-- '-u' to remove lockfile
local LOK="testlockfile" # lock file name
local -i MAX=240 SEC=0 # max seconds to gain a lock ; seconds waiting
if [ -z "$1" ] ; then # lock queue
while [ -e "$LOK" ] && (( SEC < MAX )) ; do # lock file present
sleep 1 # wait a second
(( ++SEC )) # accumulate seconds
done # try again while locked for MAX secs
[ -e "$LOK" ] && exit 1
touch "$LOK" && \
LKD='t' || \
exit 1
elif [ "$1" = '-u' ] ; then # unlock queue
'rm' -f "$LOK" # remove the lock
fi
}
lock_queue
NO=$(cat testcounterfile)
echo "$NO"
(( ++NO ))
echo "$NO" > testcounterfile
#!/bin/bash
trap on_exit EXIT # run 'on_exit' on exit
on_exit() {
[ -n "$LKD" ] && lock_queue -u 2>/dev/null # unlock the queue on exit
} # if the lock was set here
lock_queue() { # <-- '-u' to remove lockfile
local LOK="testlockfile" # lock file name
local -i MAX=240 SEC=0 # max seconds to gain a lock ; seconds waiting
if [ -z "$1" ] ; then # lock queue
mkdir $LOK 2>/dev/null && LKD='t'
while [ -z "$LKD" ] && (( SEC < MAX )) ; do # lock file present
sleep 1 # wait a second
(( ++SEC )) # accumulate seconds
mkdir $LOK 2>/dev/null && LKD='t'
done # try again while locked for MAX secs
[ -z "$LKD" ] && exit 1
elif [ "$1" = '-u' ] ; then # unlock queue
'rmdir' "$LOK" # remove the lock
fi
}
lock_queue
NO=$(cat testcounterfile)
echo "$NO"
(( ++NO ))
echo "$NO" > testcounterfile
--- msmtpq 2011-10-19 15:03:01.523338414 +0200
+++ msmtpq-my 2011-10-27 18:53:53.826788271 +0200
@@ -133,22 +133,20 @@
local -i MAX=240 SEC=0 # max seconds to gain a lock ; seconds waiting
if [ -z "$1" ] ; then # lock queue
- while [ -e "$LOK" ] && (( SEC < MAX )) ; do # lock file present
+ mkdir $LOK 2>/dev/null && LKD='t'
+ while [ -z "$LKD" ] && (( SEC < MAX )) ; do # lock file present
sleep 1 # wait a second
(( ++SEC )) # accumulate seconds
+ mkdir $LOK 2>/dev/null && LKD='t'
done # try again while locked for MAX secs
- [ -e "$LOK" ] && \
+ [ -z "$LKD" ] && \
err '' "cannot use queue $Q : waited $MAX seconds for"\
" lockfile [ $LOK ] to vanish ; giving up"\
'if you are certain that no other instance of this script'\
' is running, then delete the lock file manually' '' # lock file still there, give up
- touch "$LOK" && \
- LKD='t' || \
- log -e "$?" "couldn't create queue lock file [ $LOK ]" # lock queue
-
elif [ "$1" = '-u' ] ; then # unlock queue
- 'rm' -f "$LOK" # remove the lock
+ 'rmdir' "$LOK" # remove the lock
fi
}