Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
ÇѱÛÁ¦¸ñ(Korean Title) |
½ÉÃþ °ÈÇнÀ±â¹Ý ¿¬¼Ó»óÅ°ø°£ Á¦¾î¸¦ À§ÇÑ º¸»ó ÇÔ¼ö ºÐ¼® |
¿µ¹®Á¦¸ñ(English Title) |
Analysis of Reward Functions in Deep Reinforcement Learning for Continuous State Space Control |
ÀúÀÚ(Author) |
°¹Î±¸
±è±âÀÀ
MinKu Kang
Kee-Eung Kim
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 47 NO. 01 PP. 0078 ~ 0087 (2020. 01) |
Çѱ۳»¿ë (Korean Abstract) |
¿¬¼Ó»óÅ°ø°£¿¡¼ ÁÖ¾îÁø ŽºÅ©ÀÇ Á¦¾î¸¦ À§ÇØ ½ÉÃþ ½Å°æ¸ÁÀ» »ç¿ëÇÏ¿© °¡Ä¡ÇÔ¼ö¿Í Á¤Ã¥ÇÔ¼ö¸¦ ±Ù»çÇÏ´Â ½ÉÃþ °ÈÇнÀ(Deep Reinforcement Learning) ¾Ë°í¸®ÁòÀº ÃÖ±Ù À¯¸ÁÇÑ °á°úµéÀ» º¸¿© ÁÖ¾ú´Ù. ±×·¯³ª ÇÔ¼ö±Ù»ç¸¦ À§ÇØ »ç¿ëµÇ´Â ½ÉÃþ ½Å°æ¸ÁÀÇ ºñ-ÄÁº¤½º Ư¼ºÀÌ ÃÖÀûÈ ¾Ë°í¸®ÁòÀÇ ÀÌ·ÐÀû ºÐ¼®À» Á¾Á¾ ¾î·Æ°Ô ¸¸µé¾î ¿ÔÀ¸¸ç ÀÌ·Î ÀÎÇÏ¿© ½ÉÃþ °ÈÇнÀ ¾Ë°í¸®ÁòÀÇ Á¡±ÙÀû Àü¿ª ÃÖÀûÇØ·ÎÀÇ ¼ö·Å°ú °°Àº ÀÌ·ÐÀû º¸ÀåÀÌ ºÎÁ·ÇÏ´Ù. °ÈÇнÀ¿¡¼ÀÇ º¸»óÇÔ¼ö´Â ÇнÀ ¿¡ÀÌÀüÆ®ÀÇ ÀüüÀûÀΠƯ¼ºÀ» °áÁ¤Áþ´Â Áß¿äÇÑ ¿ä¼Ò Áß Çϳª¶ó´Â »ç½Ç¿¡ ±âÀÎÇÏ¿©, º» ³í¹®¿¡¼´Â ½ÉÃþ °ÈÇнÀ ¾Ë°í¸®ÁòÀÇ ºñ-ÄÁº¤½º ÃÖÀûÈ °úÁ¤ÀÇ ÀÌ·ÐÀû ¼ö·Å°ú °°Àº Ãø¸éº¸´Ù´Â ÀÛÁö¸¸ Áß¿äÇÑ Ãø¸é Áß Çϳª·Î½á, ½ÉÃþ°ÈÇнÀ¿¡¼ ³Î¸® »ç¿ëµÇ´Â º¸»óÇÔ¼öµéÀÇ ±¸Á¶¿Í À̵éÀÌ ÇнÀ ¾Ë°í¸®Áò¿¡ ¹ÌÄ¡´Â ¿µÇâ¿¡ ´ëÇØ ºÐ¼®ÇÑ´Ù. ½ÉÃþ °ÈÇнÀ¿¡¼ º¸»óÇÔ¼ö°¡ ÈçÈ÷ ½ÃÇàÂø¿À¹ý¿¡ ±â¹ÝÇÏ¿© ¼³°èµÇ¾î¿Â °ÍÀ» °í·ÁÇßÀ» ¶§, º» ³í¹®¿¡¼ Á¦¾ÈÇÏ´Â ºÐ¼®ÀÌ ½ÉÃþ°ÈÇнÀÀÇ º¸»ó ÇÔ¼ö ¼³°è¿¡ À¯¿ëÇÑ °¡À̵尡 µÉ ¼ö ÀÖÀ» °ÍÀ¸·Î ±â´ëÇÑ´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Deep Reinforcement Learning (DRL), which uses deep neural networks for the approximation of the value function and the policy, in continuous state-space control tasks has recently shown promising results. However, the use of deep neural networks as function approximators has often resulted in intractable analyses of DRL algorithms mainly due to their non-convexities and thus a lack of theoretical guarantee such as asymptotic global convergence of the learning algorithm. Considering the fact that the reward function in reinforcement learning is one of the key entities that determines the overall characteristics of the learning agents, we focused on a smaller but an important aspect of the analysis, investigating the structure of widely used reward functions in DRL tasks and their possible effects on the learning algorithm. The proposed analysis may facilitate identification of appropriate reward functions in DRL tasks, which has often been conducted via trial and error.
|
Å°¿öµå(Keyword) |
°ÈÇнÀ
º¸»óÇÔ¼ö
ºñ-¸ðµ¨ °ÈÇнÀ
º¸»óÇÔ¼ö ±¸Á¶
µ¥ÀÌÅÍ ±â¹Ý Á¦¾î
reinforcement learning
reward function
model-free reinforcement learning
reward structure
data-driven control
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|