• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document : 8 / 136 ÀÌÀü°Ç ÀÌÀü°Ç   ´ÙÀ½°Ç ´ÙÀ½°Ç

ÇѱÛÁ¦¸ñ(Korean Title) µö·¯´× ÇнÀ¿¡¼­ µ¿±âÈ­ ¹è¸®¾î Àç¹èÄ¡¿Í ÆÄÀÌÇÁ¶óÀÌ´×À» ÀÌ¿ëÇÑ Double-Averaging °¡¼Ó
¿µ¹®Á¦¸ñ(English Title) Double-Averaging Acceleration with Synchronization Barrier Repositioning and Pipelining in Deep Learning
ÀúÀÚ(Author) ÀÓµ¿½Å   ¾ç¿ëÁØ   Á¶ ½Å   Dong Shin Lim   Yong Jun Yang   Shin Cho   À¯ÂùÈñ   ¹Ú°æ¼®   Chanhee Yu   Kyongseok Park  
¿ø¹®¼ö·Ïó(Citation) VOL 48 NO. 11 PP. 1221 ~ 1227 (2021. 11)
Çѱ۳»¿ë
(Korean Abstract)
ºÐ»êÄÄÇ»ÆÃÀ» ÀÌ¿ëÇÑ µö·¯´×¿¡¼­ µ¿±âÈ­´Â ÇнÀ¿¡ Áß¿äÇÑ ¿ä¼Ò Áß ÇϳªÀÌ´Ù. Local SGD´Â ³·Àº ºóµµ·Î µ¿±âÈ­ÇÏ´Â ¹æ¹ýÀ¸·Î ºü¸¥ ÇнÀÀÌ °¡´ÉÇÏÁö¸¸ ¼ö·Å³­À̵µ°¡ ³ôÀº ´ÜÁ¡ÀÌ ÀÖ´Ù. ÀÌ¿¡ ¼ö·Å³­À̵µ¸¦ ³·Ãß°íÀÚ Double-Averaging°ú SlowMo°¡ Á¦¾ÈµÇ¾ú´Ù. Double-AveragingÀº momentum buffer µ¿±âÈ­¸¦ Ãß°¡ÇÏ¿© ¼ö·Å³­À̵µ¸¦ °³¼±ÇÏ¿´Áö¸¸ µ¿±âÈ­ µ¥ÀÌÅÍÀÇ Áõ°¡·Î ÀÎÇØ ÇнÀ ½Ã°£ ¶ÇÇÑ Áõ°¡ÇÏ´Â ¹®Á¦°¡ ÀÖ´Ù. ¹Ý¸é SlowMo´Â Local SGD¿¡ Two-layer momentum ±¸Á¶¸¦ Ãß°¡ÇÏ¿© µ¿±âÈ­ µ¥ÀÌÅÍÀÇ Áõ°¡¿¡ µû¸¥ ÇнÀ ½Ã°£ÀÇ Áõ°¡ ¾øÀÌ ¼ö·Å³­À̵µ¸¦ ³·Ãè´Ù. ±×·¯³ª À̸¦ À§Çؼ­´Â ÀûÀýÇÑ SlowMo ÇÏÀÌÆÛÆĶó¹ÌÅ͵éÀ» ã¾Æ¾ß ÇÏ´Â ´ÜÁ¡ÀÌ ÀÖ´Ù. µû¶ó¼­ º» ³í¹®¿¡¼­´Â µ¿±âÈ­ ¹è¸®¾î Àç¹èÄ¡¿Í ÆÄÀÌÇÁ¶óÀÌ´×À» ÀÌ¿ëÇÑ Double-Averaging °¡¼Ó¹æ¹ýÀ» Á¦¾ÈÇÏ¿´À¸¸ç ½ÇÇèÀ» ÅëÇØ ¼ö·Å³­À̵µ¿Í °¡¼Ó ¼º´É Ãø¸é¿¡¼­ ¸ðµÎ ¿ì¼öÇÔÀ» È®ÀÎÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
In deep learning using distributed computing, synchronization is one of the most important factors. While Local SGD is a low-frequency synchronization method that enables fast training, it is limited by high convergence difficulties. And Double-Averaging and SlowMo have been proposed to reduce the convergence difficulties of Local SGD. Double-Averaging improves the convergence difficulties by adding momentum buffer synchronization. However, the training time also increases due to the increased data synchronization. On the other hand, SlowMo adds a Two-layer momentum structure to the Local SGD resulting in reduced convergence difficulties without additional synchronization. However, this requires finding the appropriate SlowMo hyper-parameters. Therefore, in this paper, we proposed accelerated Double-Averaging via synchronization barrier repositioning and pipelining. The proposed method significantly reduced the convergence difficulties and accelerated performance.
Å°¿öµå(Keyword) Ãßõ ½Ã½ºÅÛ   µö·¯´×   ¼øȯ ½Å°æ¸Á   ÀÓº£µù   LSTM   recommendation system   deep learning   recurrent neural networks   embedding   LSTM   µö·¯´×   ºÐ»êÇнÀ   local SGD   double-averaging   deep learning   distributed training   local SGD   double-averaging  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå