abstract |
A method ( 100 ) and apparatus ( 700 ) are disclosed for detecting and tracking human faces across a sequence of video frames. Spatiotemporal segmentation is used to segment ( 115 ) the sequence of video frames into 3D segments. 2D segments are then formed from the 3D segments, with each 2D segment being associated with one 3D segment. Features are extracted ( 140 ) from the 2D segments and grouped into groups of features. For each group of features, a probability that the group of features includes human facial features is calculated ( 145 ) based on the similarity of the geometry of the group of features with the geometry of a human face model. Each group of features is also matched with a group of features in a previous 2D segment and an accumulated probability that said group of features includes human facial features is calculated ( 150 ). Each 2D segment is classified ( 155 ) as a face segment or a non-face segment based on the accumulated probability. Human faces are then tracked by finding 2D segments in subsequent frames associated with 3D segments associated with face segments. |