집합에 기반한 서브시퀀스 매칭 기법

여은지; 이주원; 임효상; Eunji Yeo; Juwon Lee; Hyo-Sang Lim

연구문헌

국내 학회지

홈 > 연구문헌 > 국내 학회지 > 데이터베이스 연구회지(SIGDB)

데이터베이스 연구회지(SIGDB)

Current Result Document :

한글제목(Korean Title)	집합에 기반한 서브시퀀스 매칭 기법
영문제목(English Title)	A Technique for Set-based Subsequence Matching
저자(Author)	여은지 이주원 임효상 Eunji Yeo Juwon Lee Hyo-Sang Lim
원문수록처(Citation)	VOL 32 NO. 03 PP. 0152 ~ 0169 (2016. 12)
한글내용 (Korean Abstract)	본 논문에서는 집합에 기반한 데이터스트림에서의 서브시퀀스 매칭 방법인 S-Match(Set-based subsequence Matching)를 제안하였다. 서브시퀀스 매칭은 데이터 시퀀스(data sequence) 중에서 질의 시퀀스(query sequence)와 유사한 서브시퀀스와 해당 서브시퀀스의 위치를 찾는 문제이다. S-Match는 다음의 두 가지 특징을 갖는다. 첫 번째로 사용자의 선호를 집합 개념을 고려하여 “선호 아이템 집합 시퀀스”로 표현하여 시간 개념을 고려하면서도 정확한 순서에 의한 불일치 문제를 해결하였다. 이때 아이템 집합 시퀀스 간의 유사도를 측정하기 위해 유클리디안 거리를 집합으로 확장한 유클리디안 집합 거리를 제안하였다. 두 번째로 추천 시스템(Recommendation System)의 핵심 요소인 유사 사용자 매칭 문제를 데이터스트림에서의 서브시퀀스 매칭 문제로 변환하여 다른 사용자의 최근 선호뿐만 아니라 과거의 모든 시점의 선호까지도 검색하였다. 그리고 S-Match를 수행할 때에 실제로 유사하지만 유사하지 않다고 판단되는 착오기각이 발생하지 않음을 증명하였다. 성능 평가 결과, 제안하는 S-Match가 실제 영화 평점 데이터에서 서브시퀀스 매칭을 수행하여 착오기각이 없이 정확하게 유사한 사용자를 찾아내는 것을 보였다.
영문내용 (English Abstract)	In this paper, we propose a method for set-based subsequence matching (S-Match) in data streams. Subsequence matching is a problem to find subsequences and their locations in data sequences which are similar to a query sequence. We first propose the preferred item set sequence which reflects the time concept of user preference. A preferred item set sequence is an ordered list of sets where each set collects preferred items within in a specific time interval. We then propose a similarity measurement between item set sequences, the Euclidean set distance, which extends Euclidian distance. Second, in order to find the similar user not only in current time but also in past time, we transforms the similar user matching problem into the similar subsequence matching problem. We proves that the method does not incur false dismissals which are actually similar but discarded in the results of the similar sequence matching. Through experiments with movie rating real data sets, we show that S-Match accurately finds similar users with a false dismissal.
키워드(Keyword)	서브시퀀스 매칭 데이터스트림 추천 시스템 아이템 집합 Subsequence matching data stream recommendation system item set
파일첨부	PDF 다운로드