LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법

유한묵; 김한준; 장재영; Han-mook Yoo; Han-joon Kim; Jae-young Chang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document :

한글제목(Korean Title)	LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법
영문제목(English Title)	Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI
저자(Author)	유한묵 김한준 장재영 Han-mook Yoo Han-joon Kim Jae-young Chang
원문수록처(Citation)	VOL 44 NO. 11 PP. 1236 ~ 1243 (2017. 11)
한글내용 (Korean Abstract)	본 논문은 기존의 TextRank 알고리즘에 상호정보량 척도를 결합하여 군집 기반에서 키워드 추출하는 LSI-based ClusterTextRank 기법과 추출된 키워드를 Latent Semantic Indexing(LSI)을 이용한 연관망 구축 기법을 제안한다. 제안 기법은 문서집합을 단어-문서 행렬로 표현하고, 이를 LSI를 이용하여 저차원의 개념 공간으로 차원을 축소한다. 그 다음 k-means 군집화 알고리즘을 이용하여 여러 군집으로 나누고, 각 군집에 포함된 단어들을 최대신장트리 그래프로 표현한 후 이에 근거한 군집 정보량을 고려하여 키워드를 추출한다. 그리고나서 추출된 키워드들 간에 유사도를 LSI 기법을 통해 구한 단어-개념 행렬을 이용하여 계산한 후, 이를 키워드 연관망으로 활용한다. 제안 기법의 성능을 평가하기 위해 여행 관련 블로그 데이터를 이용하였으며, 제안 기법이 기존 TextRank 알고리즘보다 키워드 추출의 정확도가 약 14% 가량 개선됨을 보인다.
영문내용 (English Abstract)	In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.
키워드(Keyword)	잠재의미색인 상호정보량 최대신장트리 클러스터링 키워드추출 텍스트마이닝 latent semantic indexing mutual information maximal spanning tree clustering keyword extraction text mining
파일첨부	PDF 다운로드