http://rdf.ncbi.nlm.nih.gov/pubchem/patent/WO-2020159692-A1
Outgoing Links
Predicate | Object |
---|---|
assignee | http://rdf.ncbi.nlm.nih.gov/pubchem/patentassignee/MD5_0acc7fd312a68983fb8b9d120043467e |
classificationCPCInventive | http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G16H50-20 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F30-27 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-126 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N7-01 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F18-217 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N20-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-006 |
classificationIPCInventive | http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N7-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N7-08 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N3-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N20-00 |
filingDate | 2020-01-10-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
inventor | http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_f22222c230ed39d0bda6969e94a366ce http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_e34ce345e4cab14ee69313a1ec31af34 |
publicationDate | 2020-08-06-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
publicationNumber | WO-2020159692-A1 |
titleOfInvention | Estimating latent reward functions from experiences |
abstract | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating latent reward functions from a set of experiences each experience specifying a respective sequence of state transitions of an environment being interacted with by an agent that is controlled using a respective latent policy. In one aspect, a method includes: generating a current Markov Decision Process (MDP); initializing a current assignment which assigns the set of experiences into a first number of partitions that are each associated with a respective latent reward function; updating the current assignment, including, for each experience: selecting a partition from a second number of candidate partitions; and assigning the experience to the selected partition; and updating the latent reward functions in accordance with a specified update rule; and updating the current MDP using latent features associated with particular latent reward functions that are determined to have highest posterior probability. |
isCitedBy | http://rdf.ncbi.nlm.nih.gov/pubchem/patent/CN-115470710-B http://rdf.ncbi.nlm.nih.gov/pubchem/patent/CN-115470710-A |
priorityDate | 2019-01-28-04:00^^<http://www.w3.org/2001/XMLSchema#date> |
type | http://data.epo.org/linked-data/def/patent/Publication |
Incoming Links
Total number of triples: 28.