Predicate |
Object |
assignee |
http://rdf.ncbi.nlm.nih.gov/pubchem/patentassignee/MD5_7fa2a002b2217830b670e2b42f923139 |
classificationCPCAdditional |
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-045 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F2209-485 |
classificationCPCInventive |
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06K9-6257 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-084 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N20-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F9-4881 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F11-1407 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F18-2148 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F9-461 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06T1-60 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06T1-20 |
classificationIPCInventive |
http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06K9-62 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N20-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06F11-14 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06F11-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06F9-48 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06F9-46 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06T1-60 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06T1-20 |
filingDate |
2018-04-18-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
grantDate |
2020-06-30-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
inventor |
http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_60635f7dc510d3c531f730571568edd6 http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_d99171783d9186fe3c5246a35fcc0448 |
publicationDate |
2020-06-30-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
publicationNumber |
US-10698766-B2 |
titleOfInvention |
Optimization of checkpoint operations for deep learning computing |
abstract |
Systems and methods are provided to optimize checkpoint operations for deep learning (DL) model training tasks. For example, a distributed DL model training process is executed to train a DL model using multiple accelerator devices residing on one or more server nodes, and a checkpoint operation is performed to generate and store a checkpoint of an intermediate DL model. A checkpoint operation includes compressing a checkpoint of an intermediate DL model stored in memory of a given accelerator device to generate a compressed checkpoint, and scheduling a time to perform a memory copy operation to transfer a copy of the compressed checkpoint from the memory of the given accelerator device to a host system memory. The scheduling is performed based on information regarding bandwidth usage of a communication link to be utilized to transfer the compressed checkpoint to perform the memory copy operation, wherein the memory copy operation is performed at the scheduled time. |
isCitedBy |
http://rdf.ncbi.nlm.nih.gov/pubchem/patent/WO-2022087811-A1 http://rdf.ncbi.nlm.nih.gov/pubchem/patent/US-11546417-B2 |
priorityDate |
2018-04-18-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
type |
http://data.epo.org/linked-data/def/patent/Publication |