JSSpamBlock Modifications

The way JSSpamBlock has evolved since I first released it has reminded me why I love open-source. From day one, I had users pointing out bugs and features they would like added, sometimes even submitting a fix for the bug or adding a new feature in themselves. Here are some modifications I have come across on other blogs:

After Georg Kaindl and I had a discussion on whether a database was really neccesary (he made some excellent points on why this is not the case, though I still maintain that the extra protection is worth the small cost of time), he released a JSSpamBlock modification as a new plugin called simpleAntiSpam. He also came up with a clever way to require that the form be parsed once by the bot for each post (although the bot can make unlimited comments to a post once it has parsed the form). I have considered making this functionality the default in an upcoming version of JSSpamBlock, since it will be more than enough protection for the average user.

More recently, I got a comment from Brandon Checketts, who had modified JSSpamBlock so that the comment field names were different than the defaults. The reason was that even if spam bots adapt to JSSpamBlock, modified field names will throw them off. Although I can’t see anyone modifying their spam bots to specifically get around my plugin, I have always tried to design it as if they eventually would, so this will likely be a feature in future versions as well.

Kevin Pendleton, another user, has ported JSSpamBlock to Perl. His version is a bit simpler; it uses a hard-coded value instead of a randomly generated one. In my experience with bots, this should be enough to block out the vast majority of spam bots.

Posted on May 21st, 2007 in JSSpamBlock

A simple diff algorithm in PHP

A diff algorithm in its most basic form takes two strings, and returns the changes needed to make the old string into the new one. They are useful in comparing different versions of a document or file, to see at a glance what the differences are between the two. Wikipedia, for example, uses diffs to compare the changes between two revisions of the same article.

Solving the problem is not as simple as it seems, and the problem bothered me for about a year before I figured it out. I managed to write my algorithm in PHP, in 18 lines of code. It is not the most efficient way to do a diff, but it is probably the easiest to understand.

It works by finding the longest sequence of words common to both strings, and recursively finding the longest sequences of the remainders of the string until the substrings have no words in common. At this point it adds the remaining new words as an insertion and the remaining old words as a deletion.

You can download the source here: PHP SimpleDiff

Posted on May 15th, 2007 in PHP