Spark 환경에서 스트림 데이터 처리를 위한 효율적인 스케줄링 기법

전현욱; 김민수; 송진우; 최도진; 김연우; 임종태; 복경수; 유재수; Hyeonwook Jeon; Minsoo Kim; JinWoo Song; DoJin Choi; Yeonwoo Kim; Jongtae Lim; Kyoungsoo Bok; Jaesoo Yoo

연구문헌

국내 학회지

홈 > 연구문헌 > 국내 학회지 > 데이터베이스 연구회지(SIGDB)

데이터베이스 연구회지(SIGDB)

Current Result Document : 10 / 16 이전건 다음건

한글제목(Korean Title)	Spark 환경에서 스트림 데이터 처리를 위한 효율적인 스케줄링 기법
영문제목(English Title)	An Efficient Scheduling Scheme for Data Stream Processing in Spark Environments
저자(Author)	전현욱 김민수 송진우 최도진 김연우 임종태 복경수 유재수 Hyeonwook Jeon Minsoo Kim JinWoo Song DoJin Choi Yeonwoo Kim Jongtae Lim Kyoungsoo Bok Jaesoo Yoo
원문수록처(Citation)	VOL 32 NO. 02 PP. 0076 ~ 0088 (2016. 08)
한글내용 (Korean Abstract)	최근 IT 기술의 발달과 함께 소셜 미디어, 모바일 단말기, 사물인터넷과 같은 다양한 매체로 인해 대규모로 발생하는 스트리밍 빅데이터를 실시간 처리하기 위한 많은 연구들이 진행되고 있다. 스트림 데이터를 실시간 처리하기 위해서는 분산 잡 스케줄링 기법이 매우 중요하다. 본 논문에서는 Spark에서 스트림 데이터를 실시간 처리하기 위해 노드의 부하를 고려한 효율적인 스케줄링 기법을 제안한다. 제안하는 기법에서는 각 노드의 부하를 판단하기 위해 CPU 사용량, 메모리 부하, 평균응답시간을 고려한다. 노드의 부하에 따라 작업을 할당하고 할당된 작업의 복잡도로 인해 노드의 부하가 증가될 경우 부하가 적은 노드에 작업을 복제하여 작업 처리를 수행함으로써 지연을 방지할 수 있다. 제안하는 기법의 우수성을 증명하기 위해 기존 기법과의 다양한 성능평가를 수행한다.
영문내용 (English Abstract)	Recently, studies on the real-time processing of big data stream generated through various media such as social media, mobile device, and internet of things along with the development of IT technologies have been done. In order to process the data streams in real-time, a distributed job scheduling scheme is very important. In this paper, we propose an efficient scheduling scheme considering node loads for the real-time data stream processing in Spark environments. The proposed scheme considers CPU utilization, memory loads, and average response times in order to determine the load of each node. It can protect the processing delay by assigning jobs to nodes according to their loads and by replicating the jobs to the nodes with little loads when the node loads increase due to the complexity of the assigned jobs. In order to show the superiority of the proposed scheme, we compare it with the existing schemes through various performance evaluations.
키워드(Keyword)	스파크 스케줄링 스트림 데이터 사물인터넷 Spark scheduling data stream IoT
파일첨부	PDF 다운로드