인-메모리 분석 프레임워크의 캐시 성능 이득 예측

정민섭; 한환수; Minseop Jeong; Hwansoo Han

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document :

한글제목(Korean Title)	인-메모리 분석 프레임워크의 캐시 성능 이득 예측
영문제목(English Title)	Predicting the Cache Performance Benefits for In-memory Data Analytics Frameworks
저자(Author)	정민섭 한환수 Minseop Jeong Hwansoo Han
원문수록처(Citation)	VOL 48 NO. 05 PP. 0479 ~ 0485 (2021. 05)
한글내용 (Korean Abstract)	인-메모리 데이터 분석 프레임워크에서 성능 개선을 위해 계산된 중간값을 캐시하는 기능을 제공한다. 애플리케이션에서 보다 효과적으로 캐싱하기 위해서는 이로 인한 성능 이득이 고려되어야 한다. 기존 프레임워크는 분산 작업 수준의 실행 시간만을 측정하기에 애플리케이션의 캐시 성능 이득을 예측하기에는 제약이 있다. 본 논문에서는 기존의 task 수준 실행 시간 측정법을 병합한 연산자 수준의 시간 측정법과 인풋 데이터 크기에 따라 함수 비용을 예측하는 모델을 제안한다. 또한, 제안한 모델과 애플리케이션의 실행 흐름을 기반으로 캐싱된 데이터셋으로 인한 성능 이득 예측법도 제안한다. 제안한 모델과 예측법은 캐시 성능 이득을 고려한 캐싱 최적화의 기회를 제공한다. 제안한 연산비용모델은 10x 인풋 데이터에서 평균 7.3%의 오차를 보였으며, 모델을 통해 예측한 성능 이득은 실제 성능 이득과 24% 이내의 차이를 보였다.
영문내용 (English Abstract)	In-memory data analytics frameworks provide intermediate results in caching facilities for performance. For effective caching, the actual performance benefits from cached data should be taken into consideration. As existing frameworks only measure execution times at the distributed task level, they have limitations in predicting the cache performance benefits accurately. In this paper, we propose an operator-level time measurement method, which incorporates the existing task-level execution time measurement with our cost prediction model according to input data sizes. Based on the proposed model and the execution flow of the application, we propose a prediction method for the performance benefits from data caching. Our proposed model provides opportunities for cache optimization with predicted performance benefits. Our cost model for operators showed prediction error rate of 7.3% on average, when measured with 10x input data. The difference between predicted performance and actual performance wes limited to within 24%.
키워드(Keyword)	캐싱 연산비용모델 성능 이득 예측 분산 데이터 처리 아파치 스파크 시스템 최적화 caching computing cost model performance benefit prediction parallel data anal apache spark system optimization
파일첨부	PDF 다운로드