abstract |
A method and apparatus for performing text-to-speech conversion in a client/server environment partitions an otherwise conventional text-to-speech conversion algorithm into two portions: a first “text analysis” portion, which generates from an original input text an intermediate representation thereof; and a second “speech synthesis” portion, which synthesizes speech waveforms from the intermediate representation generated by the first portion (i.e., the text analysis portion). The text analysis portion of the algorithm is executed exclusively on a server while the speech synthesis portion is executed exclusively on a client which may be associated therewith. The client may comprise a hand-held device such as, for example, a cell phone, and the intermediate representation of the input text advantageously comprises at least a sequence of phonemes representative of the input text. Certain audio segment information which is to be used by the speech synthesis portion of the text-to-speech process may be advantageously transmitted by the server to the client, and a cache of such audio segments may then be advantageously maintained at the client (e.g., in the cell phone) for use by the speech synthesis process in order to obtain improved quality of the synthesized speech. |