분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현

구해모; 남창민; 이우현; 이용재; 김형주; Heymo Kou; Changmin Nam; Woohyun Lee; Yongjae Lee; HyoungJoo Kim

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document : 2 / 3

한글제목(Korean Title)	분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현
영문제목(English Title)	Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function
저자(Author)	구해모 남창민 이우현 이용재 김형주 Heymo Kou Changmin Nam Woohyun Lee Yongjae Lee HyoungJoo Kim
원문수록처(Citation)	VOL 24 NO. 03 PP. 0105 ~ 0112 (2018. 03)
한글내용 (Korean Abstract)	데이터의 양이 증가하면서 단일 노드 데이터베이스로는 저장과 처리를 동시에 수행하기에는 부족하다. 따라서, 데이터를 분산시켜 복수 노드로 구성된 분산 데이터베이스에 저장되고 있으며 분석 역시 효율성을 위해 병렬 기능을 제공해야한다. 전통적인 분석 방식은 데이터베이스에서 분석 노드로 데이터를 이동시킨 후 분석을 수행하기 때문에 네트워크의 비용이 발생하며 사용자가 분석을 위해 분석 프레임워크도 다를 수 있어야한다. 본 연구는 군집화 분석 기법인 K-Means 군집화 알고리즘을 관계형 데이터베이스와 칼럼 기반 데이터베이스를 이용한 분산 데이터베이스 환경에서 SQL로 구현하는 In-database 분석 함수로의 설계와 구현 그리고 관계형 데이터베이스에서의 성능 최적화 방법을 제안한다.
영문내용 (English Abstract)	As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.
키워드(Keyword)	In-database 분석 K-Means 군집화 분산 데이터베이스 in-database analytics K-means clustering distributed database
파일첨부	PDF 다운로드