Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
LSI¸¦ ÀÌ¿ëÇÑ Â÷¿ø Ãà¼Ò Ŭ·¯½ºÅÍ ±â¹Ý Å°¿öµå ¿¬°ü¸Á ÀÚµ¿ ±¸Ãà ±â¹ý |
¿µ¹®Á¦¸ñ(English Title) |
Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI |
ÀúÀÚ(Author) |
À¯Çѹ¬
±èÇÑÁØ
ÀåÀ翵
Han-mook Yoo
Han-joon Kim
Jae-young Chang
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 44 NO. 11 PP. 1236 ~ 1243 (2017. 11) |
Çѱ۳»¿ë (Korean Abstract) |
º» ³í¹®Àº ±âÁ¸ÀÇ TextRank ¾Ë°í¸®Áò¿¡ »óÈ£Á¤º¸·® ôµµ¸¦ °áÇÕÇÏ¿© ±ºÁý ±â¹Ý¿¡¼ Å°¿öµå ÃßÃâÇÏ´Â LSI-based ClusterTextRank ±â¹ý°ú ÃßÃâµÈ Å°¿öµå¸¦ Latent Semantic Indexing(LSI)À» ÀÌ¿ëÇÑ ¿¬°ü¸Á ±¸Ãà ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. Á¦¾È ±â¹ýÀº ¹®¼ÁýÇÕÀ» ´Ü¾î-¹®¼ Çà·Ä·Î Ç¥ÇöÇÏ°í, À̸¦ LSI¸¦ ÀÌ¿ëÇÏ¿© ÀúÂ÷¿øÀÇ °³³ä °ø°£À¸·Î Â÷¿øÀ» Ãà¼ÒÇÑ´Ù. ±× ´ÙÀ½ k-means ±ºÁýÈ ¾Ë°í¸®ÁòÀ» ÀÌ¿ëÇÏ¿© ¿©·¯ ±ºÁýÀ¸·Î ³ª´©°í, °¢ ±ºÁý¿¡ Æ÷ÇÔµÈ ´Ü¾îµéÀ» ÃÖ´ë½ÅÀåÆ®¸® ±×·¡ÇÁ·Î Ç¥ÇöÇÑ ÈÄ ÀÌ¿¡ ±Ù°ÅÇÑ ±ºÁý Á¤º¸·®À» °í·ÁÇÏ¿© Å°¿öµå¸¦ ÃßÃâÇÑ´Ù. ±×¸®°í³ª¼ ÃßÃâµÈ Å°¿öµåµé °£¿¡ À¯»çµµ¸¦ LSI ±â¹ýÀ» ÅëÇØ ±¸ÇÑ ´Ü¾î-°³³ä Çà·ÄÀ» ÀÌ¿ëÇÏ¿© °è»êÇÑ ÈÄ, À̸¦ Å°¿öµå ¿¬°ü¸ÁÀ¸·Î È°¿ëÇÑ´Ù. Á¦¾È ±â¹ýÀÇ ¼º´ÉÀ» Æò°¡Çϱâ À§ÇØ ¿©Çà °ü·Ã ºí·Î±× µ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ¿´À¸¸ç, Á¦¾È ±â¹ýÀÌ ±âÁ¸ TextRank ¾Ë°í¸®Áòº¸´Ù Å°¿öµå ÃßÃâÀÇ Á¤È®µµ°¡ ¾à 14% °¡·® °³¼±µÊÀ» º¸ÀδÙ.
|
¿µ¹®³»¿ë (English Abstract) |
In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.
|
Å°¿öµå(Keyword) |
ÀáÀçÀǹ̻öÀÎ
»óÈ£Á¤º¸·®
ÃÖ´ë½ÅÀåÆ®¸®
Ŭ·¯½ºÅ͸µ
Å°¿öµåÃßÃâ
ÅؽºÆ®¸¶ÀÌ´×
latent semantic indexing
mutual information
maximal spanning tree
clustering
keyword extraction
text mining
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|