abstract |
A method and apparatus are provided for analyzing an electronic communication containing imagery, e.g., to determine whether or not the electronic communication is a spam communication. In one embodiment, an inventive method includes detecting one or more regions of imagery in a received electronic communication and applying pre-processing techniques to locate regions (e.g., blocks or lines) of text in the imagery that may be distorted. The method then analyzes the regions of text to determine whether the content of the text indicates that the electronic communication is spam. In one embodiment, specialized extraction and rectification of embedded text followed by optical character recognition processing is applied to the regions of text to extract their content therefrom. In another embodiment, keyword recognition or shape-matching processing is applied to detect the presence or absence of spam-indicative words from the regions of text. In another embodiment, other attributes of extracted text regions, such as size, location, color and complexity are used to build evidence for or against the presence of spam. |