• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö > Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ºÒÈ®½Ç¼ºÀÌ ³ôÀº ÀÇ»ç°áÁ¤ ȯ°æ¿¡¼­ SR ±â¹Ý °­È­ÇнÀ ¾Ë°í¸®ÁòÀÇ ¼º´É ºÐ¼®
¿µ¹®Á¦¸ñ(English Title) Evaluating SR-Based Reinforcement Learning Algorithm Under the Highly Uncertain Decision Task
ÀúÀÚ(Author) ±è¼ÒÇö   ÀÌÁöÇ×   Kim So Hyeon   Lee Jee Hang  
¿ø¹®¼ö·Ïó(Citation) VOL 11 NO. 08 PP. 0331 ~ 0338 (2022. 08)
Çѱ۳»¿ë
(Korean Abstract)
Â÷±â »óÅ õÀÌ Ç¥»ó(Successor representation, SR) ±â¹Ý °­È­ÇнÀ ¾Ë°í¸®ÁòÀº µÎ³ú¿¡¼­ ¹ßÇöµÇ´Â ½Å°æ°úÇÐÀû ±âÀüÀ» ¹ÙÅÁÀ¸·Î ¹ßÀüÇØ¿Â °­È­ÇнÀ ¸ðµ¨ÀÌ´Ù. Çظ¶¿¡¼­ Çü¼ºµÇ´Â ÀÎÁö¸Ê ±â¹ÝÀÇ È¯°æ ±¸Á¶ Á¤º¸¸¦ È°¿ëÇÏ¿©, º¯È­Çϴ ȯ°æ¿¡¼­µµ ºü¸£°í À¯¿¬ÇÏ°Ô ÇнÀÇÏ°í ÀÇ»ç°áÁ¤ °¡´ÉÇÑ ÀÚ¿¬ Áö´É ¸ð»çÇü °­È­ÇнÀ ¹æ¹ýÀ¸·Î, ºÒÈ®½ÇÇÑ º¸»ó ±¸Á¶ º¯È­¿¡ ´ëÇØ ºü¸£°Ô ÇнÀÇÏ°í ÀûÀÀÇÏ´Â °­ÀÎÇÑ ¼º´ÉÀ» º¸ÀÌ´Â °ÍÀ¸·Î Àß ¾Ë·ÁÁ® ÀÖ´Ù. º» ³í¹®¿¡¼­´Â Ç¥¸éÀûÀÎ º¸»ó ±¸Á¶°¡ º¯È­Çϴ ȯ°æ»Ó¸¸ ¾Æ´Ï¶ó, »óÅ õÀÌ È®·ü°ú °°Àº ȯ°æ ±¸Á¶ ³» ÀáÀç º¯¼ö°¡ º¸»ó ±¸Á¶ º¯È­¸¦ À¯¹ßÇÏ´Â »óȲ¿¡¼­µµ SR-±â¹Ý °­È­ÇнÀ ¾Ë°í¸®ÁòÀÌ °­ÀÎÇÏ°Ô ¹ÝÀÀÇÏ°í ÇнÀÇÒ ¼ö ÀÖ´ÂÁö È®ÀÎÇÏ°íÀÚ ÇÑ´Ù. ¼º´É È®ÀÎÀ» À§ÇØ, »óÅ õÀÌ¿¡ ´ëÇÑ ºÒÈ®½Ç¼º°ú ÀÌ·Î ÀÎÇÑ º¸»ó ±¸Á¶ º¯È­°¡ µ¿½Ã¿¡ ³ªÅ¸³ª´Â 2´Ü°è ¸¶¸£ÄÚÇÁ ÀÇ»ç°áÁ¤ ȯ°æ¿¡¼­, ¸ñÀû ±â¹Ý °­È­ÇнÀ ¾Ë°í¸®Áò¿¡ SRÀ» À¶ÇÕÇÑ SR-´ÙÀ̳ª °­È­ÇнÀ ¿¡ÀÌÀüÆ® ½Ã¹Ä·¹À̼ÇÀ» ¼öÇàÇÏ¿´´Ù. ´õºÒ¾î, SRÀÇ Æ¯¼ºÀ» º¸´Ù Àß °üÂûÇϱâ À§ÇØ È¯°æÀ» º¯È­½ÃÅ°´Â ÀáÀç º¯¼öµéÀ» ¼øÂ÷ÀûÀ¸·Î Á¦¾îÇϸ鼭 ±âÁ¸ÀÇ È¯°æ°ú ºñ±³ÇÏ¿© Ãß°¡ÀûÀÎ ½ÇÇèÀ» ½Ç½ÃÇÏ¿´´Ù. ½ÇÇè °á°ú, SR-´ÙÀ̳ª´Â ȯ°æ ³» »óÅ õÀÌ È®·ü º¯È­¿¡ µû¸¥ º¸»ó º¯È­¸¦ Á¦ÇÑÀûÀ¸·Î ÇнÀÇÏ´Â ÇൿÀ» º¸¿´´Ù. ´Ù¸¸ ±âÁ¸ ȯ°æ¿¡¼­ÀÇ ½ÇÇè °á°ú¿Í ºñ±³ÇßÀ» ¶§, SR-´ÙÀ̳ª´Â ÀáÀç º¯¼ö º¯È­·Î ÀÎÇÑ º¸»ó ±¸Á¶ º¯È­¸¦ ºü¸£°Ô ÇнÀÇÏÁö´Â ¸øÇÏ´Â °ÍÀ¸·Î È®ÀÎ µÇ¾ú´Ù. º» °á°ú¸¦ ÅëÇØ È¯°æ ±¸Á¶°¡ ºü¸£°Ô º¯È­Çϴ ȯ°æ¿¡¼­µµ °­ÀÎÇÏ°Ô µ¿ÀÛÇÒ ¼ö ÀÖ´Â SR-±â¹Ý °­È­ÇнÀ ¿¡ÀÌÀüÆ® ¼³°è¸¦ ±â´ëÇÑ´Ù.
¿µ¹®³»¿ë
(English Abstract)
Successor representation (SR) is a model of human reinforcement learning (RL) mimicking the underlying mechanism of hippocampal cells constructing cognitive maps. SR utilizes these learned features to adaptively respond to the frequent reward changes. In this paper, we evaluated the performance of SR under the context where changes in latent variables of environments trigger the reward structure changes. For a benchmark test, we adopted SR-Dyna, an integration of SR into goal-driven Dyna RL algorithm in the 2-stage Markov Decision Task (MDT) in which we can intentionally manipulate the latent variables – state transition uncertainty and goal-condition. To precisely investigate the characteristics of SR, we conducted the experiments while controlling each latent variable that affects the changes in reward structure. Evaluation results showed that SR-Dyna could learn to respond to the reward changes in relation to the changes in latent variables, but could not learn rapidly in that situation. This brings about the necessity to build more robust RL models that can rapidly learn to respond to the frequent changes in the environment in which latent variables and reward structure change at the same time.
Å°¿öµå(Keyword) SR Based Reinforcement Learning Algorithm   2-Stage Markov Decision Task   State Transition Probability   Reward Function   SR±â¹Ý °­È­ÇнÀ ¾Ë°í¸®Áò   2´Ü°è ¸¶¸£ÄÚÇÁ ÀÇ»ç°áÁ¤ °úÁ¦   »óÅ õÀÌ È®·ü   º¸»óÇÔ¼ö  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå