Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
À̱âÁ¾ ÇÁ·Î¼¼¼¿¡¼ÀÇ µö ·¯´× ÀÀ¿ë ¼º´É Çâ»óÀ» À§ÇÑ º´·ÄÈ ¹× ÆÄÀÌÇÁ¶óÀÌ´× ±â¹ý |
¿µ¹®Á¦¸ñ(English Title) |
Accelerating a Deep Learning Application by Parallelization and Pipelining on Heterogeneous Processors |
ÀúÀÚ(Author) |
»ï´Ï¿¨
Á¤ÀºÁø
±èÀå·ü
ÀÌÀ缺
Çϼøȸ
Samnieng Tan
EunJin Jeong
Jangryul Kim
Jaeseong Lee
Soonhoi Ha
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 27 NO. 10 PP. 0497 ~ 0502 (2021. 10) |
Çѱ۳»¿ë (Korean Abstract) |
ÀÓº£µðµå ½Ã½ºÅÛ¿¡¼ µö ·¯´× ÀÀ¿ë¿¡ ´ëÇÑ ÇÊ¿ä°¡ Áõ°¡ÇÔ¿¡ µû¶ó, ÀÀ¿ëÀ» °¡¼ÓÇÏ´Â µ¥¿¡ ÀÖ¾î¼ CPU°¡ ¾Æ´Ñ ó¸® ¿ä¼Ò(processing element)¸¦ ÀÓº£µðµå ±â±â¿¡ Æ÷ÇԵǰí ÀÖ´Ù. NVIDIA Jetson AGX Xavier´Â ´ëÇ¥ÀûÀÎ ¿¹Á¦·Î 8-core CPU »Ó¸¸ ¾Æ´Ï¶ó GPU¿Í 2°³ÀÇ µö·¯´× °¡¼Ó±â¸¦ ÇÔ²² °®°í ÀÖ¾î¼ ÀÚ¿øÀÌ Á¦ÇÑµÈ È¯°æ¿¡¼ µö ·¯´× ÀÀ¿ëÀÇ ¼º´ÉÀ» ²ø¾î¿Ã¸®´Â µ¥¿¡ È°¿ëµÈ´Ù. ÀÓº£µðµå ±â±â°¡ À̱âÁ¾Ã³¸® ¿ä¼Ò¸¦ Á¦°øÇÑ´Ù°í ÇÏ¿©µµ, ÀÌ·± ´Ù¾çÇÑ ¿ä¼ÒµéÀ» ÇÔ²² È°¿ëÇÏ¿© ¼º´ÉÀ» ¿Ã¸®´Â °ÍÀº »ó´çÇÑ ³ë·ÂÀ» ÇÊ¿ä·Î ÇÑ´Ù. º» ³í¹®¿¡¼´Â ±âÁ¸ÀÇ Á¸ÀçÇÏ´Â ¿©·¯ ±â¹ýµé°ú ¿ì¸®°¡ Á¦¾ÈÇÏ´Â ³×Æ®¿öÅ© ÆÄÀÌÇÁ¶óÀÌ´× ±â¹ýÀ» ÇÔ²² Á¶ÇÕÇÏ¿© À̱âÁ¾ 󸮿ä¼Ò¸¦ °¡Áø Xavier¿¡¼ µö ·¯´× ÀÀ¿ëÀÇ Ã³¸®·®À» ÃÖ´ëÈ ÇÏ´Â ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. ¿©·¯ °³ÀÇ À̹ÌÁö ºÐ·ù ¿¹Á¦¿Í »ç¹° ÀÎ½Ä ¿¹Á¦¸¦ ÅëÇØ ÇϳªÀÇ GPU¸¦ »ç¿ëÇÏ´Â ±âÁ¸ÀÇ ¹æ¹ý ´ëºñ ÃÖ´ë 355%±îÁö ¼º´ÉÀÌ Çâ»óµÇ´Â °ÍÀ» È®ÀÎÇÏ¿´´Ù. |
¿µ¹®³»¿ë (English Abstract) |
Since the need of deep learning applications in embedded systems is increasing, non-CPU processing elements are equipped on an embedded device to accelerate those applications. NVIDIA Jetson AGX Xavier (Xavier) is a representative example which not only has an octa-core CPU, but also has one powerful GPU and two deep learning accelerators to enhance the performance of deep learning inference on resource-constrained environments. Although an embedded device provides heterogeneous processing elements, utilizing diverse computation units is burdensome to increase performance. In this paper, we proposed a technique that combines multiple existing methods and our proposed network pipelining method to maximize the throughput of deep learning applications. Our network pipelining method is made for utilizing heterogeneous processing elements on the Xavier. Results of experiments with image classification and object detection examples revealed up to 355% improvement compared to baseline Frames Per Second (FPS) with a single GPU. |
Å°¿öµå(Keyword) |
µö ·¯´×
À̱âÁ¾ ÇÁ·Î¼¼¼
ÆÄÀÌÇÁ¶óÀÌ´×
º´·ÄÈ
deep learning
heterogeneous processors
pipelining
parallelization
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|