스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석

장재희; 박재홍; 김한주; 윤성로; Jaehee Jang; Jaehong Park; Hanjoo Kim; Sungroh Yoon

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석
영문제목(English Title)	A Comparative Performance Analysis of Spark-Based Distributed Deep-Learning Frameworks
저자(Author)	장재희 박재홍 김한주 윤성로 Jaehee Jang Jaehong Park Hanjoo Kim Sungroh Yoon
원문수록처(Citation)	VOL 23 NO. 05 PP. 0299 ~ 0303 (2017. 05)
한글내용 (Korean Abstract)	딥 러닝(Deep learning)은 기존 인공 신경망 내 계층 수를 증가시킴과 동시에 효과적인 학습 방법론을 제시함으로써 객체/음성 인식 및 자연어 처리 등 고수준 문제 해결에 있어 괄목할만한 성과를 보이고 있다. 그러나 학습에 필요한 시간과 리소스가 크다는 한계를 지니고 있어, 이를 줄이기 위한 연구가 활발히 진행되고 있다. 본 연구에서는 아파치 스파크 기반 클러스터 컴퓨팅 프레임워크 상에서 딥 러닝을 분산화하는 두 가지 툴(DeepSpark, SparkNet)의 성능을 학습 정확도와 속도 측면에서 측정하고 분석하였다. CIFAR-10/CIFAR-100 데이터를 사용한 실험에서 SparkNet은 학습 과정의 정확도 변동 폭이 적은 반면 DeepSpark는 학습 초기 정확도는 변동 폭이 크지만 점차 변동 폭이 줄어들면서 SparkNet 대비 약 15% 높은 정확도를 보였고, 조건에 따라 단일 머신보다도 높은 정확도로 보다 빠르게 수렴하는 양상을 확인할 수 있었다.
영문내용 (English Abstract)	By piling up hidden layers in artificial neural networks, deep learning is delivering outstanding performances for high-level abstraction problems such as object/speech recognition and natural language processing. Alternatively, deep-learning users often struggle with the tremendous amounts of time and resources that are required to train deep neural networks. To alleviate this computational challenge, many approaches have been proposed in a diversity of areas. In this work, two of the existing Apache Spark-based acceleration frameworks for deep learning SparkNet and DeepSpark, are compared and analyzed in terms of the training accuracy and the time demands. In the authors' experiments with the CIFAR-10 and CIFAR-100 benchmark datasets, SparkNet showed a more stable convergence behavior than DeepSpark; but in terms of the training accuracy, DeepSpark delivered a higher classification accuracy of approximately 15%. For some of the cases, DeepSpark also outperformed the sequential implementation running on a single machine in terms of both the accuracy and the running time.
키워드(Keyword)	딥 러닝 스파크 Caffe 분산 컴퓨팅 클러스터 컴퓨팅 deep learning Apache Spark Caffe parallel computing cluster computing
파일첨부	PDF 다운로드