통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가

이민학; 강우철; Minhak Lee; Woochul Kang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document : 3 / 5 이전건 다음건

한글제목(Korean Title)	통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가
영문제목(English Title)	Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory
저자(Author)	이민학 강우철 Minhak Lee Woochul Kang
원문수록처(Citation)	VOL 23 NO. 07 PP. 0417 ~ 0423 (2017. 07)
한글내용 (Korean Abstract)	최근, 딥러닝을 사용 가능한 임베디드 디바이스가 상용화됨에 따라 임베디드 시스템 영역에서도 딥러닝 활용에 대한 다양한 연구가 진행되고 있다. 그러나 임베디드 시스템을 고성능 PC 환경과 비교하면 상대적으로 저사양의 CPU/GPU 프로세서와 메모리를 탑재하고 있으므로 딥러닝 기술의 적용에 있어서 많은 제약이 있다. 본 논문에서는 다양한 최신 딥러닝 네트워크들을 임베디드 디바이스에 적용했을때의 성능을 시간과 전력이라는 관점에서 실험적으로 평가한다. 또한, 호스트 CPU와 GPU 디바이스간의 메모리를 공유하는 임베디드 시스템들의 아키텍처적인 특성을 이용하여 메모리 복사를 줄임으로써 실시간 성능과 저전력성을 높이는 방법을 제시한다. 제안된 방법은 대표적인 공개 딥러닝 프레임워크인 Caffe를 수정하여 구현되었으며, 임베디드 GPU를 탑재한 NVIDIA Jetson TK1에서 성능평가 되었다. 실험결과, 대부분의 딥러닝 네트워크에서 뚜렷한 성능향상을 관찰할 수 있었다. 특히, 메모리 사용량이 높은 AlexNet 에서 약 33%의 이미지 인식 속도 단축과 50%의 소비 전력량 감소를 관찰할 수 있었다.
영문내용 (English Abstract)	Recently, many embedded devices that have the computing capability required for deep learning have become available; hence, many new applications using these devices are emerging. However, these embedded devices have an architecture different from that of PCs and highperformance servers. In this paper, we propose a method that improves the performance of deep-learning framework by considering the architecture of an embedded device that shares memory between the CPU and the GPU. The proposed method is implemented in Caffe, an open-source deep-learning framework, and is evaluated on an NVIDIA Jetson TK1 embedded device. In the experiment, we investigate the image recognition performance of several state-of-the-art deeplearning networks, including AlexNet, VGGNet, and GoogLeNet. Our results show that the proposed method can achieve significant performance gain. For instance, in AlexNet, we could reduce image recognition latency by about 33% and energy consumption by about 50%.
키워드(Keyword)	임베디드 시스템 GPU 딥러닝 저전력 이미지 식별 embedded system GPU deep learning low power image recognition
파일첨부	PDF 다운로드