http://rdf.ncbi.nlm.nih.gov/pubchem/patent/WO-2020159692-A1

Outgoing Links

Predicate Object
assignee http://rdf.ncbi.nlm.nih.gov/pubchem/patentassignee/MD5_0acc7fd312a68983fb8b9d120043467e
classificationCPCInventive http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G16H50-20
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F30-27
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-126
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N7-01
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F18-217
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N20-00
http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-006
classificationIPCInventive http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N7-00
http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N7-08
http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N3-00
http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N20-00
filingDate 2020-01-10-04:00^^<http://www.w3.org/2001/XMLSchema#date>
inventor http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_f22222c230ed39d0bda6969e94a366ce
http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_e34ce345e4cab14ee69313a1ec31af34
publicationDate 2020-08-06-04:00^^<http://www.w3.org/2001/XMLSchema#date>
publicationNumber WO-2020159692-A1
titleOfInvention Estimating latent reward functions from experiences
abstract Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating latent reward functions from a set of experiences each experience specifying a respective sequence of state transitions of an environment being interacted with by an agent that is controlled using a respective latent policy. In one aspect, a method includes: generating a current Markov Decision Process (MDP); initializing a current assignment which assigns the set of experiences into a first number of partitions that are each associated with a respective latent reward function; updating the current assignment, including, for each experience: selecting a partition from a second number of candidate partitions; and assigning the experience to the selected partition; and updating the latent reward functions in accordance with a specified update rule; and updating the current MDP using latent features associated with particular latent reward functions that are determined to have highest posterior probability.
isCitedBy http://rdf.ncbi.nlm.nih.gov/pubchem/patent/CN-115470710-B
http://rdf.ncbi.nlm.nih.gov/pubchem/patent/CN-115470710-A
priorityDate 2019-01-28-04:00^^<http://www.w3.org/2001/XMLSchema#date>
type http://data.epo.org/linked-data/def/patent/Publication

Incoming Links

Predicate Subject
isDiscussedBy http://rdf.ncbi.nlm.nih.gov/pubchem/protein/ACCP27638
http://rdf.ncbi.nlm.nih.gov/pubchem/substance/SID415856398
http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID157849
http://rdf.ncbi.nlm.nih.gov/pubchem/taxonomy/TAXID548838
http://rdf.ncbi.nlm.nih.gov/pubchem/anatomy/ANATOMYID548838

Total number of triples: 28.