Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
ºÒÈ®½Ç¼ºÀÌ ³ôÀº ÀÇ»ç°áÁ¤ ȯ°æ¿¡¼ SR ±â¹Ý °ÈÇнÀ ¾Ë°í¸®ÁòÀÇ ¼º´É ºÐ¼® |
¿µ¹®Á¦¸ñ(English Title) |
Evaluating SR-Based Reinforcement Learning Algorithm Under the Highly Uncertain Decision Task |
ÀúÀÚ(Author) |
±è¼ÒÇö
ÀÌÁöÇ×
Kim So Hyeon
Lee Jee Hang
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 11 NO. 08 PP. 0331 ~ 0338 (2022. 08) |
Çѱ۳»¿ë (Korean Abstract) |
Â÷±â »óÅ õÀÌ Ç¥»ó(Successor representation, SR) ±â¹Ý °ÈÇнÀ ¾Ë°í¸®ÁòÀº µÎ³ú¿¡¼ ¹ßÇöµÇ´Â ½Å°æ°úÇÐÀû ±âÀüÀ» ¹ÙÅÁÀ¸·Î ¹ßÀüÇØ¿Â °ÈÇнÀ ¸ðµ¨ÀÌ´Ù. Çظ¶¿¡¼ Çü¼ºµÇ´Â ÀÎÁö¸Ê ±â¹ÝÀÇ È¯°æ ±¸Á¶ Á¤º¸¸¦ È°¿ëÇÏ¿©, º¯ÈÇϴ ȯ°æ¿¡¼µµ ºü¸£°í À¯¿¬ÇÏ°Ô ÇнÀÇÏ°í ÀÇ»ç°áÁ¤ °¡´ÉÇÑ ÀÚ¿¬ Áö´É ¸ð»çÇü °ÈÇнÀ ¹æ¹ýÀ¸·Î, ºÒÈ®½ÇÇÑ º¸»ó ±¸Á¶ º¯È¿¡ ´ëÇØ ºü¸£°Ô ÇнÀÇÏ°í ÀûÀÀÇÏ´Â °ÀÎÇÑ ¼º´ÉÀ» º¸ÀÌ´Â °ÍÀ¸·Î Àß ¾Ë·ÁÁ® ÀÖ´Ù. º» ³í¹®¿¡¼´Â Ç¥¸éÀûÀÎ º¸»ó ±¸Á¶°¡ º¯ÈÇϴ ȯ°æ»Ó¸¸ ¾Æ´Ï¶ó, »óÅ õÀÌ È®·ü°ú °°Àº ȯ°æ ±¸Á¶ ³» ÀáÀç º¯¼ö°¡ º¸»ó ±¸Á¶ º¯È¸¦ À¯¹ßÇÏ´Â »óȲ¿¡¼µµ SR-±â¹Ý °ÈÇнÀ ¾Ë°í¸®ÁòÀÌ °ÀÎÇÏ°Ô ¹ÝÀÀÇÏ°í ÇнÀÇÒ ¼ö ÀÖ´ÂÁö È®ÀÎÇÏ°íÀÚ ÇÑ´Ù. ¼º´É È®ÀÎÀ» À§ÇØ, »óÅ õÀÌ¿¡ ´ëÇÑ ºÒÈ®½Ç¼º°ú ÀÌ·Î ÀÎÇÑ º¸»ó ±¸Á¶ º¯È°¡ µ¿½Ã¿¡ ³ªÅ¸³ª´Â 2´Ü°è ¸¶¸£ÄÚÇÁ ÀÇ»ç°áÁ¤ ȯ°æ¿¡¼, ¸ñÀû ±â¹Ý °ÈÇнÀ ¾Ë°í¸®Áò¿¡ SRÀ» À¶ÇÕÇÑ SR-´ÙÀ̳ª °ÈÇнÀ ¿¡ÀÌÀüÆ® ½Ã¹Ä·¹À̼ÇÀ» ¼öÇàÇÏ¿´´Ù. ´õºÒ¾î, SRÀÇ Æ¯¼ºÀ» º¸´Ù Àß °üÂûÇϱâ À§ÇØ È¯°æÀ» º¯È½ÃÅ°´Â ÀáÀç º¯¼öµéÀ» ¼øÂ÷ÀûÀ¸·Î Á¦¾îÇÏ¸é¼ ±âÁ¸ÀÇ È¯°æ°ú ºñ±³ÇÏ¿© Ãß°¡ÀûÀÎ ½ÇÇèÀ» ½Ç½ÃÇÏ¿´´Ù. ½ÇÇè °á°ú, SR-´ÙÀ̳ª´Â ȯ°æ ³» »óÅ õÀÌ È®·ü º¯È¿¡ µû¸¥ º¸»ó º¯È¸¦ Á¦ÇÑÀûÀ¸·Î ÇнÀÇÏ´Â ÇൿÀ» º¸¿´´Ù. ´Ù¸¸ ±âÁ¸ ȯ°æ¿¡¼ÀÇ ½ÇÇè °á°ú¿Í ºñ±³ÇßÀ» ¶§, SR-´ÙÀ̳ª´Â ÀáÀç º¯¼ö º¯È·Î ÀÎÇÑ º¸»ó ±¸Á¶ º¯È¸¦ ºü¸£°Ô ÇнÀÇÏÁö´Â ¸øÇÏ´Â °ÍÀ¸·Î È®ÀÎ µÇ¾ú´Ù. º» °á°ú¸¦ ÅëÇØ È¯°æ ±¸Á¶°¡ ºü¸£°Ô º¯ÈÇϴ ȯ°æ¿¡¼µµ °ÀÎÇÏ°Ô µ¿ÀÛÇÒ ¼ö ÀÖ´Â SR-±â¹Ý °ÈÇнÀ ¿¡ÀÌÀüÆ® ¼³°è¸¦ ±â´ëÇÑ´Ù. |
¿µ¹®³»¿ë (English Abstract) |
Successor representation (SR) is a model of human reinforcement learning (RL) mimicking the underlying mechanism of hippocampal cells constructing cognitive maps. SR utilizes these learned features to adaptively respond to the frequent reward changes. In this paper, we evaluated the performance of SR under the context where changes in latent variables of environments trigger the reward structure changes. For a benchmark test, we adopted SR-Dyna, an integration of SR into goal-driven Dyna RL algorithm in the 2-stage Markov Decision Task (MDT) in which we can intentionally manipulate the latent variables – state transition uncertainty and goal-condition. To precisely investigate the characteristics of SR, we conducted the experiments while controlling each latent variable that affects the changes in reward structure. Evaluation results showed that SR-Dyna could learn to respond to the reward changes in relation to the changes in latent variables, but could not learn rapidly in that situation. This brings about the necessity to build more robust RL models that can rapidly learn to respond to the frequent changes in the environment in which latent variables and reward structure change at the same time. |
Å°¿öµå(Keyword) |
SR Based Reinforcement Learning Algorithm
2-Stage Markov Decision Task
State Transition Probability
Reward Function
SR±â¹Ý °ÈÇнÀ ¾Ë°í¸®Áò
2´Ü°è ¸¶¸£ÄÚÇÁ ÀÇ»ç°áÁ¤ °úÁ¦
»óÅ õÀÌ È®·ü
º¸»óÇÔ¼ö
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|