• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) Intel KNL Ŭ·¯½ºÅÍ È¯°æ¿¡¼­ AVX-512 ±â¹Ý Blocked GEMM ¾Ë°í¸®ÁòÀ» È°¿ëÇÑ ScaLAPACKÀÇ º´·Ä Çà·Ä °ö¼À ¿¬»êPDGEMM ¼º´É Çâ»ó
¿µ¹®Á¦¸ñ(English Title) Improvement on Parallel Matrix Multiplication Routines in ScaLAPACK using Blocked Matrix Multiplication Algorithm on Intel KNL Clusters with AVX-512
ÀúÀÚ(Author) ¾È¼ºÈ¯   ÀåÀ¯Áø   ÇϽÂÁØ   ³²¹ü¼®   Sunghwan Ahn   Yujin Jang   Seungjun Ha   Beomseok Nam   ÀÀÀ¢Æ¼¹Ì¶Ñ¿£   ¹ÚÀ¯»ó   ÃÖÀ翵   Thi My Tuyen Nguyen   Yoosang Park   Jaeyoung Choi  
¿ø¹®¼ö·Ïó(Citation) VOL 48 NO. 01 PP. 0007 ~ 0012 (2021. 01)
Çѱ۳»¿ë
(Korean Abstract)
Çà·Ä°ö¼À¿¬»ê(DGEMM)Àº ¼±Çü´ë¼öÇÐ, ¸Ó½Å·¯´×, Åë°èºÐ¾ß µî¿¡¼­ Àû¿ëµÇ´Â ÇÙ½É °è»ê ·çƾÀ¸·Î, ÇÁ·Î¼¼¼­ Á¦Á¶È¸»çµéÀÌ ¿©·¯ Äھ °¡Áø ´ÜÀϳëµå¿¡¼­ ¾î¼Àºí¸® Äڵ带 »ç¿ëÇÏ¿© Á÷Á¢ ÃÖÀûÈ­½ÃŲ ·çƾµéÀ» ¹ßÇ¥ÇÏ¿´À¸¸ç, ´Ù¾çÇÑ ÀÚµ¿ Æ©´× ±â¹ýÀ» ÅëÇØ °è»ê°úÁ¤À» ÃÖÀûÈ­½ÃÅ°±â À§ÇÑ ¸¹Àº ¿¬±¸µéÀ» ¼öÇàÇÏ¿´´Ù. Çà·Ä°ö¼À¿¬»êÀÇ Ã³¸® ½Ã°£À» È¿°úÀûÀ¸·Î ÁÙÀ̱â À§Çؼ­´Â ³ëµåº°·Î ¼öÇàµÇ´Â °ö¼À°úÁ¤À» ÃÖÀûÈ­½ÃÄÑ º´·ÄÄÄÇ»Æà ȯ°æ¿¡ ÀûÇÕÇÑ ÇüÅ·Πó¸®ÇÒ ¼ö ÀÖ´Â À§ÇÑ ¹æ¹ýÀÌ ÇÊ¿äÇÏ´Ù. º» ³í¹®¿¡¼­´Â Intel Knights Landing (KNL) ȯ°æ¿¡¼­ÀÇ º´·Ä ¹èÁ¤¹Ðµµ ºÎµ¿¼Ò¼öÁ¡ Çà·Ä°ö¼À¿¬»ê(PDGEMM) ¹× Àû¿ë°úÁ¤À» ¼Ò°³ÇÑ´Ù. Á¦¾ÈÇÏ´Â Àû¿ë°úÁ¤ÀÇ ¼¼ºÎ»çÇ×Àº º´·ÄÄÄÇ»Æà ½ÇÇà ȯ°æÀ» À§ÇÑ ´ÜÀÏ ³ëµåÀÇ Çà·Ä°ö¼À¿¬»êÀ» ÃÖÀûÈ­ÇÑ ºÎºÐÇà·Ä°ö¼À °úÁ¤°ú KNL ½ÇÇà ȯ°æ¿¡ Àû¿ëÇÒ ¼ö ÀÖ´Â Intel AVX-512 ¸í·É¾î¸¦ Àû¿ëÇÒ ¼ö ÀÖ´Â ÄÄÆÄÀÏ °úÁ¤À» Æ÷ÇÔÇÑ´Ù. ½ÇÇè¿¡¼­´Â Á¦¾ÈÇÏ´Â PDGEMMÀÇ ¼º´ÉÀÌ °¢ 4°³ ¹× 16°³ ³ëµå·Î ±¸¼ºµÈ KNL Ŭ·¯½ºÅÍ È¯°æ¿¡¼­ Intel Math Kernel Library (MKL)ÀÇ º´·Ä Çà·Ä°ö¼À·çƾº¸´Ù °¢ 6% ¹× 68% Çâ»óµÈ ¼º´ÉÀ» º¸ÀÓÀ» È®ÀÎÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
General matrix multiplication (GEMM) is a core computation algorithm in linear algebra, machine learning, statistics, and many other domains. Optimizations of such routines, including GEMM, have been conducted by vendors and researches with auto-tuning techniques. To achieve high performance for parallel matrix multiplication, a matrix multiplication processing scheme based on the optimization of local matrix multiplication at each node should be necessarily applied. In this paper, the application of parallel double-precision general matrix multiplication (PDGEMM) on Intel KNL was examined. The application of DGEMM calculated sub-matrices multiplication at each node. Details of the proposed DGEMM were introduced, including a blocked matrix multiplication algorithm with AVX-512 instruction sets and several optimization techniques, such as the data prefetching, loop unrolling, and cache blocking. This study found that the proposed PDGEMM performance was better than that in the ordinary cases of PDGEMM from the Intel Math Kernel Library (MKL) on both 4 and 16-node KNL clusters, with the flop rate improvements of 6% and 68%, respectively.
Å°¿öµå(Keyword) ºñÈֹ߼º ¸Þ¸ð¸®   ¿ø°Ý Á¢±Ù   SkipList   non-volatile memory   NUMA   remote access   SkipList   º´·Ä Çà·Ä°ö¼À¿¬»ê   º´·Ä BLAS   KNL   AVX-512   parallel matrix-matrix multiplication   parallel BLAS   Intel Knights Landing   AVX-512  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå