Image spam filtering by optical pattern matching |
---|
Evgeny Smirnov |
Kaspersky Lab |
Graphic spam is used frequently by spammers because we all know that a picture is worth a thousand words. At the same time, it is more difficult to detect than text spam, especially when spammers use a multitude of tricks. Furthermore, filtering graphic spam requires more processor time, particularly when standard OCR methods for detecting graphic spam are used. Therefore, there is a crying need for technologies to detect and block graphic spam technologies which are effective, secure and resource saving. I will begin with a brief overview of existing technologies to combat graphic spam. I will describe optical pattern matching - which includes heuristic analysis of text layout, calculation and production of OCR-like measurements, methods of building inexact signatures and patterns against entire images or fragments , as well as methods for using databases of samples to search for expressions and signatures. Naturally, any anti-spam filter must be productive, have a maximal catch rate balanced out with a minimal false positive rate. I will describe how this technology works and also provide statistical data about the success rates. Points to be discussed during the presentation:
|