patent/WO-2020159692-A1

http://rdf.ncbi.nlm.nih.gov/pubchem/patent/WO-2020159692-A1

Outgoing Links

Predicate	Object
assignee	http://rdf.ncbi.nlm.nih.gov/pubchem/patentassignee/MD5_0acc7fd312a68983fb8b9d120043467e
classificationCPCInventive	http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G16H50-20 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F30-27 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-126 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N7-01 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06F18-217 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N20-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentcpc/G06N3-006
classificationIPCInventive	http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N7-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N7-08 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N3-00 http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/G06N20-00
filingDate	2020-01-10-04:00^^<http://www.w3.org/2001/XMLSchema#date>
inventor	http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_f22222c230ed39d0bda6969e94a366ce http://rdf.ncbi.nlm.nih.gov/pubchem/patentinventor/MD5_e34ce345e4cab14ee69313a1ec31af34
publicationDate	2020-08-06-04:00^^<http://www.w3.org/2001/XMLSchema#date>
publicationNumber	WO-2020159692-A1
titleOfInvention	Estimating latent reward functions from experiences
abstract	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating latent reward functions from a set of experiences each experience specifying a respective sequence of state transitions of an environment being interacted with by an agent that is controlled using a respective latent policy. In one aspect, a method includes: generating a current Markov Decision Process (MDP); initializing a current assignment which assigns the set of experiences into a first number of partitions that are each associated with a respective latent reward function; updating the current assignment, including, for each experience: selecting a partition from a second number of candidate partitions; and assigning the experience to the selected partition; and updating the latent reward functions in accordance with a specified update rule; and updating the current MDP using latent features associated with particular latent reward functions that are determined to have highest posterior probability.
isCitedBy	http://rdf.ncbi.nlm.nih.gov/pubchem/patent/CN-115470710-B http://rdf.ncbi.nlm.nih.gov/pubchem/patent/CN-115470710-A
priorityDate	2019-01-28-04:00^^<http://www.w3.org/2001/XMLSchema#date>
type	http://data.epo.org/linked-data/def/patent/Publication

Incoming Links

Predicate	Subject
isDiscussedBy	http://rdf.ncbi.nlm.nih.gov/pubchem/protein/ACCP27638 http://rdf.ncbi.nlm.nih.gov/pubchem/substance/SID415856398 http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID157849 http://rdf.ncbi.nlm.nih.gov/pubchem/taxonomy/TAXID548838 http://rdf.ncbi.nlm.nih.gov/pubchem/anatomy/ANATOMYID548838

Total number of triples: 28.