Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
ÆĶó¸ÞÆ®¸¯ È°¼ºÇÔ¼ö¸¦ ÀÌ¿ëÇÑ ±â¿ï±â ¼Ò½Ç ¹®Á¦ÀÇ ¿ÏÈ |
¿µ¹®Á¦¸ñ(English Title) |
Alleviation of Vanishing Gradient Problem Using Parametric Activation Functions |
ÀúÀÚ(Author) |
±èÅÂÁø
±èÈñÂù
À̼ö¿ø
Taejin Kim
Heechan Kim
Soowon Lee
°í¿µ¹Î
°í¼±¿ì
Young Min Ko
Sun Woo Ko
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 10 NO. 10 PP. 0407 ~ 0420 (2021. 10) |
Çѱ۳»¿ë (Korean Abstract) |
½ÉÃþ½Å°æ¸ÁÀº ´Ù¾çÇÑ ¹®Á¦¸¦ ÇØ°áÇϴµ¥ ³Î¸® »ç¿ëµÇ°í ÀÖ´Ù. ÇÏÁö¸¸ Àº´ÐÃþÀÌ ±íÀº ½ÉÃþ½Å°æ¸ÁÀ» ÇнÀÇÏ´Â µ¿¾È ºó¹øÈ÷ ¹ß»ýÇÏ´Â ±â¿ï±â ¼Ò½Ç ¶Ç´Â ÆøÁÖ ¹®Á¦´Â ½ÉÃþ½Å°æ¸Á ÇнÀÀÇ Å« °É¸²µ¹ÀÌ µÇ°í ÀÖ´Ù. º» ¿¬±¸¿¡¼´Â ±â¿ï±â ¼Ò½ÇÀÌ ¹ß»ýÇÏ´Â ¿øÀÎ Áß ºñ¼±ÇüÈ°¼ºÇÔ¼ö¿¡ ÀÇÇØ ¹ß»ýÇÒ ¼ö ÀÖ´Â ±â¿ï±â ¼Ò½Ç ¹®Á¦¸¦ ¿ÏÈÇϱâ À§ÇØ ÆĶó¸ÞÆ®¸¯ È°¼ºÇÔ¼ö¸¦ Á¦¾ÈÇÑ´Ù. Á¦¾ÈµÈ ÆĶó¸ÞÆ®¸¯ È°¼ºÇÔ¼ö´Â ÀÔ·Â µ¥ÀÌÅÍÀÇ Æ¯¼º¿¡ µû¶ó È°¼ºÇÔ¼öÀÇ Å©±â ¹× À§Ä¡¸¦ º¯È¯½Ãų ¼ö ÀÖ´Â ÆĶó¹ÌÅ͸¦ Àû¿ëÇÏ¿© ¾òÀ» ¼ö ÀÖÀ¸¸ç ¿ªÀüÆÄ°úÁ¤À» ÅëÇØ È°¼ºÇÔ¼öÀÇ ¹ÌºÐ Å©±â¿¡ Á¦ÇÑÀÌ ¾ø´Â ¼Õ½ÇÇÔ¼ö¸¦ ÃÖ¼Òȵǵµ·Ï ÇнÀ½Ãų ¼ö ÀÖ´Ù. Àº´ÐÃþ ¼ö°¡ 10°³ÀÎ XOR¹®Á¦¿Í Àº´ÐÃþ ¼ö°¡ 8°³ÀÎ MNIST ºÐ·ù¹®Á¦¸¦ ÅëÇÏ¿© ±âÁ¸ ºñ¼±ÇüÈ°¼ºÇÔ¼ö¿Í ÆĶó¸ÞÆ®¸¯È°¼ºÇÔ¼öÀÇ ¼º´ÉÀ» ºñ±³ÇÏ¿´°í Á¦¾ÈÇÑ ÆĶó¸ÞÆ®¸¯ È°¼ºÇÔ¼ö°¡ ±â¿ï±â ¼Ò½Ç ¿ÏÈ¿¡ ¿ì¿ùÇÑ ¼º´ÉÀ» °¡ÁüÀ» È®ÀÎÇÏ¿´´Ù. |
¿µ¹®³»¿ë (English Abstract) |
Deep neural networks are widely used to solve various problems. However, the deep neural network with a deep hidden layer frequently has a vanishing gradient or exploding gradient problem, which is a major obstacle to learning the deep neural network. In this paper, we propose a parametric activation function to alleviate the vanishing gradient problem that can be caused by nonlinear activation function. The proposed parametric activation function can be obtained by applying a parameter that can convert the scale and location of the activation function according to the characteristics of the input data, and the loss function can be minimized without limiting the derivative of the activation function through the backpropagation process. Through the XOR problem with 10 hidden layers and the MNIST classification problem with 8 hidden layers, the performance of the original nonlinear and parametric activation functions was compared, and it was confirmed that the proposed parametric activation function has superior performance in alleviating the vanishing gradient. |
Å°¿öµå(Keyword) |
À½¾Ç Ãßõ
ÅÂ±× ÀÚµ¿ ºÐ·ù
¼Ò¸® µ¥ÀÌÅÍ
Music Recommendation
Automatic Tag Classification
Sound Data
½ÉÃþ½Å°æ¸Á
±â¿ï±â ¼Ò½Ç ¹®Á¦
ÆĶó¸ÞÆ®¸¯ È°¼ºÇÔ¼ö
¿ªÀüÆÄ
ÇнÀ
Deep Neural Network
Vanishing Gradient Problem
Parametric Activation Function
Backpropagation
Learning
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|