abstract |
Herein is described a data-parallel algorithm for topic modeling in which the memory requirements are streamlined for implementation on a highly-parallel architecture, such as a GPU. Specifically, approximate counters are used in a large mixture model or clustering algorithm (e.g., an uncollapsed Gibbs sampler) to decrease memory usage over what is required when conventional counters are used. The decreased memory usage of the approximate counters allows a highly-parallel architecture with limited memory to process more computations for the large mixture model more efficiently. Embodiments describe binary Morris approximate counters, general Morris approximate counters, and Cs rös approximate counters in the context of an uncollapsed Gibbs sampler, and, more specifically, for a Greedy Gibbs sampler. |