By: Lawrence AbramsFor the last year one of the more annoying developments in mass spamming has been a technique where the spam is embedded in an image. With this method it is much more difficult for spam filters to determine if an email with a image is spam or legitimate. Over time developments were made to counteract this technique by performing optical character recognition on the attached images to determine the text encoded in them. This would allow the spam filter to then analyze the text in the image and determine if it was spam.
This week a new technique started being used in which spam image is embedded in an attached PDF document as shown below.

One method for spam filters to combat this new technique would be to extract the images from the PDF and then perform their normal image analysis techniques on the exported image. Unfortunately, the spammers have altered the PDF documents so that they are damaged but still viewable. This makes it so at least three open source PDF converters I tested (Xpdf, ImageMagick, and PDF-111) can't properly extract the image from the spam while they work perfectly with normal PDF files.
Overtime, a new method will be found to correct these PDF files so that they can be parsed, but until that time keep a lookout for these types of spam.
