스파크를 사용한 다중 레이블 질의를 위한 대규모 이미지 클러스터링

C. M. 와식; 박희민; C.M. Wasiq; Heemin Park

연구문헌

학술대회 프로시딩

홈 > 연구문헌 > 학술대회 프로시딩 > 한국정보과학회 학술대회 > 2016년 컴퓨터종합학술대회

2016년 컴퓨터종합학술대회

Current Result Document :

한글제목(Korean Title)	스파크를 사용한 다중 레이블 질의를 위한 대규모 이미지 클러스터링
영문제목(English Title)	Clustering a Large Number of Images for Multi-Label Queries using Spark
저자(Author)	C. M. 와식 박희민 C.M. Wasiq Heemin Park
원문수록처(Citation)	VOL 43 NO. 01 PP. 1432 ~ 1434 (2016. 06)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	This paper models an image database as a graph to sort the images into clusters based on their similarity. The images are considered as nodes and the edges between these nodes represent the similarity between them. The graph needs to be a multi-graph to illustrate the various attributes that need to be considered while comparing the similarities between photos. The whole process of modeling the images as a graph is performed using GraphX, which is an API of the distributed data processing platform, Spark. After modelling, we use a strongly connected components algorithm defined by the GraphX API to create clusters of photos that are heavily linked with each other i.e. are most similar. For this purpose, we use Amazon EC2 to run our algorithm in an efficient and distributed fashion. The motivation behind this research is to propose a novel, fast and more efficient way to reduce the time it takes for a person to find particular photos by grouping the similar ones together. The code will be written on Scala, a programing language supported by Spark, and deployed on the Amazon EC2 cluster for faster processing.
키워드(Keyword)	Spark Hadoop GraphX Graph Partitioning Distributed Computing Amazon EC2
파일첨부	PDF 다운로드