BERT 학습에서 GEMM 연산의 낮은 GPU 활용도 분석

이선정; 안정호; Sunjung Lee; Jung Ho Ahn

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	BERT 학습에서 GEMM 연산의 낮은 GPU 활용도 분석
영문제목(English Title)	Performance Analysis of GPU Under-utilization when Operating GEMM in BERT Training
저자(Author)	이선정 안정호 Sunjung Lee Jung Ho Ahn
원문수록처(Citation)	VOL 49 NO. 04 PP. 0232 ~ 0238 (2022. 04)
한글내용 (Korean Abstract)	GPU는 효율적인 병렬화 연산을 바탕으로 딥 뉴럴 네트워크(Deep Neural Network) 학습에 주로 사용된다. 하지만, BERT 학습 간 나타나는 GEMM의 연산 특성으로 인해 GPU는 최대 성능을 제공하지 못한다. 본 논문에서 우리는 V100, A100 GPU를 이용하여 BERT 학습의 가장 중요한 연산인 GEMM을 수행했을 때 GPU가 연산기들을 효율적으로 활용하지 못하는 원인들을 분석하였다. 이를 통해 DRAM 용량의 제한과 BERT의 구조적인 특성으로 인해 GPU가 일을 균등하게 할당받지 못하는 문제를 확인하였다. 추가적으로, 일의 양을 작은 단위로 나누어 GPU의 병렬성을 높이는 방법과 메모리 계층의 대역폭의 트레이드-오프에 대해서 분석하였으며 병렬성을 높이더라도 메모리 대역폭 병목에 의해서 실제 GPU의 성능은 낮아지는 것을 확인하였다. 이러한 분석 결과들을 바탕으로 GPU의 DRAM 용량과 메모리 계층 구조에서 대역폭의 중요성을 확인한다.
영문내용 (English Abstract)	Graphics processing units (GPUs) are mainly used for deep neural network training based on efficient parallel computation. However, due to the computational characteristics of GEMM when executing BERT training, GPUs do not provide maximum performance. In this paper, we analyze the reasons behind why GPUs cannot be utilized efficiently when GPUs perform the GEMM operation, which is the most important task in BERT training. We identify challenges that the GPU does not allocate tasks evenly to parallel computing units due to the limitation of DRAM capacity and the structural characteristics of BERT. In addition, we analyze the trade-off between increasing the parallelism of the GPU by dividing the number of tasks into smaller units and the memory bandwidth. We confirm that even if the parallelism increases, the performance of the actual GPU is reduced due to the memory bandwidth bottleneck. Based on our results, we explain the importance of the DRAM capacity and bandwidth of the memory hierarchy in the GPU.
키워드(Keyword)	BERT 학습 낮은 GPU 활용도 메모리 계층 구조 DRAM 용량 BERT training GPU under-utilization memory hierarchy DRAM capacity
파일첨부	PDF 다운로드