k-NN Join Based on LSH in Big Data Environment

홈 > 연구문헌 > 영문 논문지 > JICCE (한국정보통신학회)

한글제목(Korean Title)	k-NN Join Based on LSH in Big Data Environment
영문제목(English Title)	k-NN Join Based on LSH in Big Data Environment
저자(Author)	Jiaqi Ji Yeongjee Chung
원문수록처(Citation)	VOL 16 NO. 02 PP. 0099 ~ 0105 (2018. 06)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.
키워드(Keyword)	Big data High dimension k-NN join LSH Spark
파일첨부	PDF 다운로드