abstract |
In certain aspects, the invention features a system that includes a number of grid-cluster schedulers, wherein each grid-cluster scheduler has software in communication with a number of computing resources, wherein each of the computing resources has an availability, and wherein the grid-cluster scheduler is configured to obtain a quantity of said computing resources as well as the availability and to allocate work for a client application to one or more of the computing resources based on the quantity and availability of the computing resources. In such aspects, the system further includes a meta-scheduler in communication with the grid-cluster schedulers, wherein the meta-scheduler is configured to direct work dynamically for one or more client applications to at least one of the grid-cluster schedulers based at least in part on data from each of the grid-cluster schedulers. Further aspects concern systems and methods that include: receiving, for computation by one or more clusters of a distributed computing system, work of a client application; sending a job to each cluster and gathering telemetry data based on a response from each cluster to the job; normalizing the telemetry data from each cluster; determining which of the clusters are able to accept the client application's work; and determining which of the clusters will receive a portion of the work. |