OpenMP 디바이스 컨스트럭트의 CUDA 소스 코드로의 변환 및 런타임 최적화 기법

박대영; 이재진; Daeyoung Park; Jaejin Lee

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	OpenMP 디바이스 컨스트럭트의 CUDA 소스 코드로의 변환 및 런타임 최적화 기법
영문제목(English Title)	Source-level Translation of OpenMP Device Constructs to CUDA and Runtime Optimization Methods
저자(Author)	박대영 이재진 Daeyoung Park Jaejin Lee
원문수록처(Citation)	VOL 27 NO. 02 PP. 0110 ~ 0115 (2021. 02)
한글내용 (Korean Abstract)	본 논문은 OpenMP 4.5 device construct를 이용하여 개발된 C 소스 코드를 대응하는 CUDA 소스 코드로 변환하는 컴파일러와 이를 지원하는 런타임 시스템을 제안한다. 먼저, OpenMP의 실행 모델, 메모리 모델 및 동기화 과정을 살펴보고, source-level 변환의 방법을 설명한다. 또한, 성능 향상을 위해 고안된 버디 할당자, UDTE와 같은 런타임 시스템 최적화 기술을 소개한다. 실험은 SPEC-ACCEL 1.2 벤치마크를 이용한다. 실험 결과 비교 대상인 gcc7 대비 6배 이상, mriq를 제외한 경우에도 2배 이상의 성능 향상을 가져왔다. 본 논문의 프레임워크를 바탕으로 향후 컴파일러 및 런타임 최적화 기술을 추가적으로 개발할 수 있을 것으로 기대된다
영문내용 (English Abstract)	This paper deals with an OpenMP framework for GPU offloading. The framework is composed of a compiler and a runtime system that converts C programs written using the OpenMP 4.5 device construct to CUDA programs. First, we look at the execution model, memory model, and synchronization process of OpenMP, and explain how to translate in the source-level. Moreover, we use runtime optimization techniques such as buddy allocator, and UDTE to improve execution performance. Using the SPEC-ACCEL 1.2 benchmark suite, it shows up to 6 times better performance than the gcc7 framework. We expect that additional runtime and compiler optimization techniques can be applied based on the framework of this paper.
키워드(Keyword)	OpenMP Device 오프로딩 CUDA 소스 코드 변환 Runtime 최적화 기법 device offloading source-level translation runtime optimization methods
파일첨부	PDF 다운로드