Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
ÇѱÛÁ¦¸ñ(Korean Title) |
µö·¯´× ÇнÀ¿¡¼ µ¿±âÈ ¹è¸®¾î Àç¹èÄ¡¿Í ÆÄÀÌÇÁ¶óÀÌ´×À» ÀÌ¿ëÇÑ Double-Averaging °¡¼Ó |
¿µ¹®Á¦¸ñ(English Title) |
Double-Averaging Acceleration with Synchronization Barrier Repositioning and Pipelining in Deep Learning |
ÀúÀÚ(Author) |
ÀÓµ¿½Å
¾ç¿ëÁØ
Á¶ ½Å
Dong Shin Lim
Yong Jun Yang
Shin Cho
À¯ÂùÈñ
¹Ú°æ¼®
Chanhee Yu
Kyongseok Park
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 48 NO. 11 PP. 1221 ~ 1227 (2021. 11) |
Çѱ۳»¿ë (Korean Abstract) |
ºÐ»êÄÄÇ»ÆÃÀ» ÀÌ¿ëÇÑ µö·¯´×¿¡¼ µ¿±âÈ´Â ÇнÀ¿¡ Áß¿äÇÑ ¿ä¼Ò Áß ÇϳªÀÌ´Ù. Local SGD´Â ³·Àº ºóµµ·Î µ¿±âÈÇÏ´Â ¹æ¹ýÀ¸·Î ºü¸¥ ÇнÀÀÌ °¡´ÉÇÏÁö¸¸ ¼ö·Å³À̵µ°¡ ³ôÀº ´ÜÁ¡ÀÌ ÀÖ´Ù. ÀÌ¿¡ ¼ö·Å³À̵µ¸¦ ³·Ãß°íÀÚ Double-Averaging°ú SlowMo°¡ Á¦¾ÈµÇ¾ú´Ù. Double-AveragingÀº momentum buffer µ¿±âȸ¦ Ãß°¡ÇÏ¿© ¼ö·Å³À̵µ¸¦ °³¼±ÇÏ¿´Áö¸¸ µ¿±âÈ µ¥ÀÌÅÍÀÇ Áõ°¡·Î ÀÎÇØ ÇнÀ ½Ã°£ ¶ÇÇÑ Áõ°¡ÇÏ´Â ¹®Á¦°¡ ÀÖ´Ù. ¹Ý¸é SlowMo´Â Local SGD¿¡ Two-layer momentum ±¸Á¶¸¦ Ãß°¡ÇÏ¿© µ¿±âÈ µ¥ÀÌÅÍÀÇ Áõ°¡¿¡ µû¸¥ ÇнÀ ½Ã°£ÀÇ Áõ°¡ ¾øÀÌ ¼ö·Å³À̵µ¸¦ ³·Ãè´Ù. ±×·¯³ª À̸¦ À§Çؼ´Â ÀûÀýÇÑ SlowMo ÇÏÀÌÆÛÆĶó¹ÌÅ͵éÀ» ã¾Æ¾ß ÇÏ´Â ´ÜÁ¡ÀÌ ÀÖ´Ù. µû¶ó¼ º» ³í¹®¿¡¼´Â µ¿±âÈ ¹è¸®¾î Àç¹èÄ¡¿Í ÆÄÀÌÇÁ¶óÀÌ´×À» ÀÌ¿ëÇÑ Double-Averaging °¡¼Ó¹æ¹ýÀ» Á¦¾ÈÇÏ¿´À¸¸ç ½ÇÇèÀ» ÅëÇØ ¼ö·Å³À̵µ¿Í °¡¼Ó ¼º´É Ãø¸é¿¡¼ ¸ðµÎ ¿ì¼öÇÔÀ» È®ÀÎÇÏ¿´´Ù. |
¿µ¹®³»¿ë (English Abstract) |
In deep learning using distributed computing, synchronization is one of the most important factors. While Local SGD is a low-frequency synchronization method that enables fast training, it is limited by high convergence difficulties. And Double-Averaging and SlowMo have been proposed to reduce the convergence difficulties of Local SGD. Double-Averaging improves the convergence difficulties by adding momentum buffer synchronization. However, the training time also increases due to the increased data synchronization. On the other hand, SlowMo adds a Two-layer momentum structure to the Local SGD resulting in reduced convergence difficulties without additional synchronization. However, this requires finding the appropriate SlowMo hyper-parameters. Therefore, in this paper, we proposed accelerated Double-Averaging via synchronization barrier repositioning and pipelining. The proposed method significantly reduced the convergence difficulties and accelerated performance. |
Å°¿öµå(Keyword) |
Ãßõ ½Ã½ºÅÛ
µö·¯´×
¼øȯ ½Å°æ¸Á
ÀÓº£µù
LSTM
recommendation system
deep learning
recurrent neural networks
embedding
LSTM
µö·¯´×
ºÐ»êÇнÀ
local SGD
double-averaging
deep learning
distributed training
local SGD
double-averaging
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|