How do I get Plesk's version of Spamassassin to auto-learn which messages are spam (and why isn't this functionality included?)
Last things first: I have no idea why this isn't included already. I can't speak to that. I can only speak to how infuriating it is that there's no other method to train the spamassassin that ships with Plesk. I had customers (from my geek-friendly hosting company - they're awesome, if I do say so myself) complaining that they were getting a lot of spam, even though they'd turned on the spam filter. Most of the customers thought that was all that they had to do, and those that knew about things such as "training" found that getting to the place to do it was labor-prohibitive - it was simpler to just click "delete". Can't really blame 'em, as I did the same thing.
Well, that's just dumb. I'm a geek. Aside from an almost pathological allergy to phsycal activity, we geeks are known for problem-solving and creative thinking, so that's what I did. After giving a bit of thought to the issue, I decided that an automated, hourly script that would clean up selected inboxes would be an ideal solution. It's an opt-in solution so only people who are interested will partake, and it replaces the "delete" action with a "move to a folder" option, which is nearly as economical.
So, without further ado, here's the script, which can always be found at Geek Niche.
#!/bin/bash
### Script
#
# Plesk Spamassassin Auto-learning Script
# Author: paul@geekniche.com
#
###
### General usage
#
# Since you're looking for a solution to handle auto-learning on a Plesk system which utilizes
# spamassassin, I'm going to assume that you're at least a little technically minded. That
# being said, this should be relatively straightforward. Other than this script, you need one
# other file for it to work, and that's the mailbox list. It needs to be specially formatted,
# one domain per line, like so:
#
# domain.com|user1:user2:user3
#
# Please note the pipe (|) separator between the domain and the users, and the colon (:) between
# each user. In order to run this script all automated-like, enter the following in cron:
#
# 0 * * * * /var/scripts/autoSpamTrain /var/scripts/spamUserList &> /dev/null
#
# This command will run the scripti every hour, with the spamUserList file as the first (and
# only) argument and discard any errors. Logging is handled by the script itself.
#
# Questions, comments, improvements, etc are appreciated. Please feel free to email me at the
# email address provided above.
#
###
# Constants
MAILNAMES_PATH="/var/qmail/mailnames" # Works for both qmail and postfix under Plesk (v9.2).
SPAM_FOLDER=".Spam" # Name this what you like, but make sure to include the period.
LOG="/var/log/autoSpamTrain.log" # Put this wherever you like. /var/log seemed to fit for my needs.
# learnAndFlush args are the following: $1 = domain, $2 = username
function learnAndFlush {
echo -e "Learning new Bayesian data from spam for $2@$1" >>$LOG
sa-learn --dbpath ${MAILNAMES_PATH}/$1/$2/.spamassassin --spam ${MAILNAMES_PATH}/$1/$2/Maildir/${SPAM_FOLDER}/cur/ >>$LOG
flush $1 $2
}
function flush {
echo -e "Removing spam from ${MAILNAMES_PATH}/$1/$2" >>$LOG
find ${MAILNAMES_PATH}/$1/$2/Maildir/${SPAM_FOLDER}/cur -type f -exec rm {} \;
}
### Main script stars here ###
script_start=$(date +%s)
echo -e "###\n# Script started at " `date` "\n###\n" >>$LOG
FILE=""
# Make sure we get file name as command line argument
# Else stop the script
if [ "$1" == "" ]; then
echo "Must supply proper userlist. Script go boom now."
exit
else
FILE="$1"
# make sure file exist and readable
if [ ! -f $FILE ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE ]; then
echo "$FILE: can not read"
exit 2
fi
fi
# read $FILE using the file descriptors
# Set loop separator to end of line
BAKIFS=$IFS
IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<$FILE
while read line
do
FS="|"
domain=$(echo $line|cut -d$FS -f1)
users=$(echo $line|cut -d$FS -f2)
SaveIFS=$IFS
IFS=":"
usersArray=( $users )
for (( i = 0 ; i < ${#usersArray[@]} ; i++ ))
do
learnAndFlush $domain ${usersArray[$i]}
done
IFS=$SaveIFS
done
exec 0<&3
# restore $IFS which was used to determine what the field separators are
IFS=$BAKIFS
script_end=$(date +%s)
let total="$script_end - $script_start"
echo -e "\n###\n# Script ended at " `date` "\n###\n" >>$LOG
echo "$total seconds to run" >>$LOG
echo -e "\n+++++++++++++++++++++++++++++++++++++++++++++++++++++++\n" >>$LOG
exit 0