Fighting spam: Spam-free guest book
MySQL is necessary, anyway
Anyone who operates a guest book surely has to put the data somewhere. In
general files or databases are a good
choice for this.
The problem concerning files is that you have to define the structure of your
files and program every routine for accessing the files yourself. A database is
much more convenient, because all these routines are already present in the
process managing the database and you merely have to tell it what it is to do.
The advantage of a file on the other hand would be that you could set up access
to your guestbook the way you need it, but that requires a certain level of
knowledge of the programming language that you are using. This proves to be a
lot easier with the aid of a database manager since you only need to know how
to tell it what it is supposed to do. The manager will do the rest for you.
You may use any database that you feel comfortable with, which merely depends
on the operating system being used, and at the bottom line there are only
subtle differences.
For simplicity's sake the use of MySQL for your guest book is being used. When
you use another type of database you may have to adjust the SQL commands being
used to match the requirements of the database being used.
Pimp the database
You surely have created the tables necessary to hold the entries of your guest
book. That includes such data as time of creation of the post, a name
(alternatively a nick- or a real name), the URL of a home page, the title and
the content. An e-mail address may be stored in addition to each post. These
are data that you definitely have to store in order to implement a viable guest
book.
On top of that you may also create another table which only contains one single
record containing every option for controlling the operation of the guest book.
This e. g. makes it possible to disable the
comment function at the push of a button, especially when you see a
dramatic increase in the number of nonsensical entries. This gives you the
option to clear your guest book out without other trash to show up, and once
the problem has been dealt with, you can easily reenable the comment function
without having to meddle with the script. This firstly saves you time and
secondly avoids errors.
To prepare your guest book for fighting spam it is necessary to create additional tables holding a poster's IP as well as his e-mail addresse. This way you can lock a poster out if he attempts to publicize any spam. You then can decide how to deal with that, that is, if you want to give the scribbler in question a temporary or even a permanent block.
to the topAdditional helpful lists
It is furthermore feasible to create additional tables that record who has
attempted to publish something how often in which time frame and inhibit
additional posts when you think that that is too much. You can even manage the
sending of confirmation mail this way,
because an entry is generated for each mail being sent, and only when a link
contained in the mail is clicked, the associated message is released.
You can also combine the option of the confirmation mail with your list of the
IP or mail addresses and so unblock someone who has clicked on the link in the
e-mail for a certain time, thereby preventing him from receiving any further
confirmation mail when posting something, but his posts are accepted without
any additional indirection when he has passed the other tests. You may also
permit the owner of an e-mail address to block it against further use when
someone attempts to abuse it. In this case the address in question cannot be
used any more to publish any messages, but any attempt to use it is instead
rejected immediately.
To create these new tables you have to log in to your database to begin with. You may use the command-line tool for this, or you can use a GUI instead to gain access. Even office programs like LibreOffice provide a means to access a database.
to the topThe table for IP addresses
This table registers the IP addresses of anyone who intends to post something
to your guest book and records their status. This way you can record whether or
not it is permitted to post entries from a particular IP address.
You create this table as follows:
Since it is not productive to retain old data the point of time at which access
from a particular IP address has taken place together with the time frame in
days after which an entry is considered expired is recorded. That allows you to
discard any entry that has expired.
This prevents IP addresses from being blocked too long, because many of them
are assigned dynamically when Internet users are coming online and released
when they go offline again. This can change assignments, and an address
previously assigned to e. g. a spammer or behind which a box infected by a bot
was residing could be assigned to a normal user who doesn't have anything to do
with such things the next time. If old entries wouldn't be deleted, the field
of IP addresses that have access to the guest book would be more and more
limited, up to the point that it becomes virtually unusable.
The State field in turn can be used to unblock users who have proven to be
legitimate so that they aren't affected by some of your safeguards any more.
You could abstain from enforcing some of the measures meant to confirm that the
user in question is not a bot for IP addresses flagged accordingly.
The spam score in turn is needed to determine whether or not you are dealing
with spam emanating from the IP address in question. However, that requires
other measures whose result you can easily record in this field to trigger
adequate measures when this score becomes too high.
The table for mail addresses
This records the mail addresses that are entered when someone posts an entry.
Since it is possible to provide any mail address, it is possible to enter a
nonexistent mail address or one belonging to someone else. This allows you to
determine whether or not posting from a particular address is permitted.
This table is created as follows:
Two data fields recording two points of time are provided to allow for sensible
measures: The point of time at which the entry was created and the point of
time at which the mail address has last been seen. The second field is used to
determine when a mail address has to be considered expired and should therefore
be dropped. This prevents old records from piling up in the database.
The first field in turn could be used to adjust the time that a particular
record is retained. A record that repeatedly registers activity can have its
retention time extended, e. g. to prevent abuse of mail addresses and to keep
up the block longer, but in case of a positive entry you can grant the user an
extended time frame in which he can post entries without additional checks.
The expiry time is needed here as well to control the pruning of old records.
This prevents old records from remaining, which could yield erroneous results.
This takes into account the phenomenon that formerly unassigned addresses are
registered and previously existing addresses are released.
Every address is additionally flagged with a state that determines whether or
not it may be used in posting to the guest book. This also allows for checking
the validity of a particular mail address.
Here a means of recording a spam score has been provided as well so potential
appearances of spam can be registered and you can react appropriately if
necessary. These checks are performed when other measures are applied, whose
result you may easily record here.
The table for blocked IP addresses
As far as adresses that wind up on this list are concerned, there's but one
thing to be said: You don't want to allow any access from there. That can have
multiple reasons, for example repeated unsolicited access to your guest book or
actions that you don't want to tolerate for other reasons like brute-force
attacks on potential login forms or repeated obtrusive attempts to post
something to your guest book – and even unsolicited access attempts to other
services like SSH that you can take as a criterion to determine whether or not
you want to permit access from particular IP addresses.
This table is created as follows:
First of all, you need the addresses that you wish to block. These addresses
shouldn't be taken into account by your guest book. However, a multitue of
access attempts can take place so that the number of entries in this table
could become very big very quickly.
Therefore you can specify a netmask as an integer between 0 and 32 so that you
can block an entire network segment at once with a single entry. This reduces
the amount of data, especially since unsolicited access originates from
definite network segments. In this case you can disable multiple sources at
once.
The thir field enables you to control the behavior of each entry. For example,
you can specify which entries are active and which ones not and so control the
access restrictions as needed. You can furthermore set whether or not certain
entries expire after a certain amount of time so that you are able to react
aequately to temporary problems an still be capable of permanently locking out
persistent problems. You can furthermore indicate that you permit displaying
the guest book, but have the form for submitting new entries disabled. This
enables you to grant certain IP addresses restricted access to the guest book.
Access from addresses listed herein is normally unconditionally blocked, but
depending on the severity of the incident the entries are going to be pruned
sooner or later. For this the point of time at which an entry has caused a hit
is recorded together with an estimate of the severity of the problem. The more
problematic the emergence of spam, the later the associated entry expires.
However, the guest book shouldn't be the only criterion, but other sources
should be taken into account as well, like intrusion attempts into SSH, certain
invalid access attempts to web sites, especially CGI scripts and PHP files.
This way you can block address segments both automatically and manually. You
can place these blocks directly upon access so that no additional checks are
necessary. It is merely necessary to implement some checks at various spots
that seek for suspicious activity.
However, if the number of access attempts reach a dimension that could threaten
your server it is recommended to have any access from these address segments
blocked by the firewall.
The table used by the flood protection
It permits detecting whether or not too many access attempts are made during a
certain time frame or whether or not a particular mail address is repeatedly
used in rapid succession. There is normally no reason for attempting to post
something to the guest book from a particular IP address in rapid succession or
to use a particular mail address multiple times in a short time frame. However,
that makes it necessary to record this information in the first place.
This table is created as follows:
The first field takes the IP address from which access has been made, and the
second field holds the IP address that has been used. The third field specifies
what is meant with the entry so that you can put both IP and mail addresses
into the same table but still have the option to query both separately.
The fourth field records when a particular entry has been created. This means
that for a particular IP or mail address there can me multiple entries in case
multiple access attempts are made so that you can determine how many attempts
have been made from the number of entries. This could also be implemented by
means of a counter, but the points of time for each single access would have to
be recorded somewhere and updated regularly. This, however, would incur an
additional management overhead that can be avoided by recording each access
separately. You furthermore can easily prune expired entries this way without
having to long-windedly manipulate a list containing the times first.
Optional: The table for verifying e-mail addresses
If you want to add another layer of protection, you can now check whether or
not a valid address has been entered when publishing a message. Since spammers
usually fake mail addresses, you can reject any attempts that bypass your other
means of protection this way. If the address doesn't exist, nothing is going to
happen, anyway, and the message is discarded after a preset time has elapsed.
The same happens if the mail address is existent, but its owner doesn't react
to it.
You can of course offer an additional option that enables the owner of said
address to protect it against being used in the guest book for a certain period
of time. Choosing this option sets an explicit block, and before this lockout
period expires, any attempt to publish a message with said mail address as
origin are rejected without further notification.
For the message to actually be published, it is required that the recipient of
the confirmation mail explicitly confirms the process. You need to provide a
link inside the mail that points to a link performing the verification. This is
done either by means of a parameter that is passed to the script or with a form
that shows up when a wrong code or nothing at all has been passed.
After successful verification the desired action can be executed.
You can create the associated table like this:
This records both the mail address and a code to facilitate the verification.
Furthermore the IP address from which the entry has been registered is recorded
so it can be reconstructed what has happened when and how. Additionally the
sequential number of the message to be published is recorded so that it can be
publicized when confirmed – or rejected upon an active objection. The date of
expiry states the deadline up to which the verification can be performed. After
this deadline has passed, the message to be verified can be discarded.
There also is an optional field included that specifies the target of the
message when you have more than one source for messages, e. g. a forum.
This check has two advantages: Firstly a message must be explicitly confirmed for it to be displayed in the first place. This neatly prevents publishing messages in the name of others. Secondly the message in question can be rejected immediately if the verification fails, e. g. because the confirmation mail has bounced. However, that requires setting up the mail system in a way that potential error reports are trapped and forwarded to an analysis script which in turn deletes any messages in question.
to the topAdvice for working with the new tables
These tables are just additional records if on their own. For them to have an effect it is necessary that you adjust the script for the guest book in such a fashion that it accesses these tables and reacts to the data contained therein. For optimal effect, the following sequence is recommended:
- Check for blocked IP address
- Checks on the IP address
- Plausibility checks on the mail address
- Checks on the other data
- Flood protection
- Send confirmation mail
This sequence is actually established by how things come together the best way. Determining whether or not a particular IP address has explicitly been blocked and so doesn't have to be taken into account is the simplest check. Since this list usually is static, you may reject any address listed here without having to perform any additional checks. The following piece of code takes care of this:
If the IP address is found on the list, the variable $has_ban is set to 1 here,
which can make the script react appropriately, e. g. by informing the user in
question that he doesn't have any permission to access the guest book – or
alternatively present the guest book to him and disable the form instead.
If the IP address in question isn't found on the list, a check is needed to see
whether or not any incidents have been recorded for it or if it has on the
contrary been confirmed as legitimate. But no matter what the situation, a hit
is going to reset the expiry time so that the timeout is restarted.
This check is done with the following code snippet:
You surely can check the mail address as well. It is especially logical
inconsistencies that we are particularly interested in, e. g. a non-present
domain. In this event you can immediately reject the attempt to post something
to your guest book.
You can check the mail address with the following function:
This initially checks whether or not the format of the address is valid in the
first place and extracts the recipient as well as subdomain and TLD in the
process. This TLD in turn is subjected to a validity check (takes place in an
external function), and finally an attempt is made to resolve the complete
domain name.
If any check returns with an error (invalid format of the mail address,
nonexistent TLD, unknown domain), this is announced by means of the return
value so that the caller can take appropriate measures. This check can be done
like this:
This code snippet initially invokes the previously introduced function for
checking the sanity of a mail address, and dependent on the result, appropriate
action is taken.
As you can see the list of IP addresses is already used here to record any
anomalies. This way you can easily react on problems.
The next table that you should use is responsible for the so-called
flood protection. It prevents anyone
from automatedly posting a plethora of messages in a brief time.
The following code snippet implements the flood protection:
This table is set up so that every access creates one entry. This allows for an
easy check to find out how many access attempts have already been made. You
only have to read and count all entries associated with a certain IP address.
If you find more entries than permitted, the script should reject any further
access until the value drops below the preset threshold.
This way more records are created than with creating only one record per IP
address that retains the access times, but working with the list becomes much
easier this way, because you don't have to put up with long-windedly
manipulating any records, but instead any expired entries may be deleted in an
easy way. This significantly reduces the computational time required for
managing the list.