Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
K-means Ŭ·¯½ºÅ͸µ ¹æ¹ý°ú À¯»çµµ ÃøÁ¤ ±â¹ÝÀÇ Ã¤Æà ¸»¹¶Ä¡ ¹ÝÀÚµ¿ È®Àå ¹æ¹ý |
¿µ¹®Á¦¸ñ(English Title) |
Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure |
ÀúÀÚ(Author) |
¾ÈÀçÇö
°í¿µÁß
Jaehyun An
Youngjoong Ko
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 46 NO. 05 PP. 0440 ~ 0447 (2019. 05) |
Çѱ۳»¿ë (Korean Abstract) |
º» ³í¹®¿¡¼´Â ¿µÈ ÀÚ¸·, ±Ø ´ëº»°ú °°ÀÌ ´ë·®ÀÇ ¹ßÈ µ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ¿© äÆà ¸»¹¶Ä¡¸¦ ¹ÝÀÚµ¿À¸·Î È®ÀåÇÏ´Â ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. äÆà ¸»¹¶Ä¡ È®ÀåÀ» À§ÇØ ¹Ì¸® ±¸ÃàµÈ äÆà ¸»¹¶Ä¡¿Í À¯»çµµ ±â¹ýÀ» ÀÌ¿ëÇÏ¿© äÆà À¯»çµµ¸¦ ±¸ÇÏ°í, äÆà À¯»çµµ°¡ ½ÇÇèÀ» ÅëÇØ ¾òÀº ÀÓ°è°ªº¸´Ù Å©´Ù¸é ¿Ã¹Ù¸¥ äÆà ½ÖÀ̶ó°í ÆÇ´ÜÇÏ¿´´Ù. º» ³í¹®¿¡¼ Á¦¾ÈÇÏ´Â °ÍÀº ÇüÅÂ¼Ò ´ÜÀ§ ÀÓº£µù º¤ÅÍ¿Í ÇÕ¼º°ö ½Å°æ¸Á ¸ðµ¨À» ÀÌ¿ëÇÏ¿© ¹ßÈ ´ÜÀ§ Ç¥»óÀ» »ý¼ºÇÏ´Â °ÍÀÌ´Ù. ±×¸®°í ¹ÝÀÚµ¿ ±¸Ãà ¸ðµ¨ÀÇ ¼Óµµ¸¦ °³¼±Çϱâ À§Çؼ K-means Ŭ·¯½ºÅ͸µ ¹æ¹ýÀ» Àû¿ëÇÏ¿© äÆà ¸»¹¶Ä¡¸¦ ±ºÁý, °è»ê·®À» ÁÙÀÏ °ÍÀ» Á¦¾ÈÇÑ´Ù. ±× °á°ú ±âº» ¹ßÈ ´ÜÀ§ Ç¥»ó»ý¼º ¹æ¹ýÀÎ TF¸¦ ÀÌ¿ëÇÏ´Â °Íº¸´Ù Á¤È®·ü, ÀçÇöÀ², F1¿¡¼ °¢°¢ 5,16%p, 6.09%p, 5.73%p °¢°¢ »ó½ÂÇÏ¿© 61.28%, 53.19%, 56.94%ÀÇ ¼º´ÉÀ» µµÃâÇÏ¿´´Ù. ±×¸®°í ¼Óµµ °³¼±À» À§ÇØ ¹ßȸ¦ Ŭ·¯½ºÅ͸µÇÏ¿© ¼Óµµ ¸é¿¡¼µµ 103¹è Çâ»óµÈ äÆà ¸»¹¶Ä¡ ¹ÝÀÚµ¿ ±¸Ãà ¸ðµ¨À» ±¸ÃàÇÒ ¼ö ÀÖ¾ú´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
In this paper, we proposed a semi-automatic expansion method to expand a chatting corpus using a large amount of utterance data from movie subtitles and drama scripts. To expand the chatting corpus, the proposed system used previously constructed chatting corpus and a similarity measure. If the similarity is calculated between a previously constructed chatting corpus and the input utterance was greater than a threshold value set in the experiment, the input utterance was selected as a new chatting utterance, that it is a correct chatting pair. We used morpheme-unit word embeddings and a Convolutional Neural Networks to efficiently calculate the similarity of the utterance embedding. In order to improve the speed of the semi-automatic expansion process, we proposed to reduce the amount of computation by clustering chat corpus by K-means clustering algorithm. Experimental results showed that the precision, recall, and F1 score of the proposed system were 61.28%, 53.19%, and 56.94%, respectively, which was 5.16%p, 6.09%, and 5.73%p higher than that of the baseline system. The term frequency and the speed of our system were also about a hundred times faster.
|
Å°¿öµå(Keyword) |
äÆà ½Ã½ºÅÛ
¹ÝÀÚµ¿ È®Àå
À¯»çµµ
ÇÕ¼º°ö ½Å°æ¸Á
¹ßÈ ´ÜÀ§ Ç¥»ó
chatting system
semi-automatic expansion
similarity
convolutional neural networks
utterance embedding
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|