abstract |
A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering and non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures. |