• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) K-means Ŭ·¯½ºÅ͸µ ¹æ¹ý°ú À¯»çµµ ÃøÁ¤ ±â¹ÝÀÇ Ã¤Æà ¸»¹¶Ä¡ ¹ÝÀÚµ¿ È®Àå ¹æ¹ý
¿µ¹®Á¦¸ñ(English Title) Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure
ÀúÀÚ(Author) ¾ÈÀçÇö   °í¿µÁß   Jaehyun An   Youngjoong Ko  
¿ø¹®¼ö·Ïó(Citation) VOL 46 NO. 05 PP. 0440 ~ 0447 (2019. 05)
Çѱ۳»¿ë
(Korean Abstract)
º» ³í¹®¿¡¼­´Â ¿µÈ­ ÀÚ¸·, ±Ø ´ëº»°ú °°ÀÌ ´ë·®ÀÇ ¹ßÈ­ µ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ¿© äÆà ¸»¹¶Ä¡¸¦ ¹ÝÀÚµ¿À¸·Î È®ÀåÇÏ´Â ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. äÆà ¸»¹¶Ä¡ È®ÀåÀ» À§ÇØ ¹Ì¸® ±¸ÃàµÈ äÆà ¸»¹¶Ä¡¿Í À¯»çµµ ±â¹ýÀ» ÀÌ¿ëÇÏ¿© äÆà À¯»çµµ¸¦ ±¸ÇÏ°í, äÆà À¯»çµµ°¡ ½ÇÇèÀ» ÅëÇØ ¾òÀº ÀÓ°è°ªº¸´Ù Å©´Ù¸é ¿Ã¹Ù¸¥ äÆà ½ÖÀ̶ó°í ÆÇ´ÜÇÏ¿´´Ù. º» ³í¹®¿¡¼­ Á¦¾ÈÇÏ´Â °ÍÀº ÇüÅÂ¼Ò ´ÜÀ§ ÀÓº£µù º¤ÅÍ¿Í ÇÕ¼º°ö ½Å°æ¸Á ¸ðµ¨À» ÀÌ¿ëÇÏ¿© ¹ßÈ­ ´ÜÀ§ Ç¥»óÀ» »ý¼ºÇÏ´Â °ÍÀÌ´Ù. ±×¸®°í ¹ÝÀÚµ¿ ±¸Ãà ¸ðµ¨ÀÇ ¼Óµµ¸¦ °³¼±Çϱâ À§Çؼ­ K-means Ŭ·¯½ºÅ͸µ ¹æ¹ýÀ» Àû¿ëÇÏ¿© äÆà ¸»¹¶Ä¡¸¦ ±ºÁý, °è»ê·®À» ÁÙÀÏ °ÍÀ» Á¦¾ÈÇÑ´Ù. ±× °á°ú ±âº» ¹ßÈ­ ´ÜÀ§ Ç¥»ó»ý¼º ¹æ¹ýÀÎ TF¸¦ ÀÌ¿ëÇÏ´Â °Íº¸´Ù Á¤È®·ü, ÀçÇöÀ², F1¿¡¼­ °¢°¢ 5,16%p, 6.09%p, 5.73%p °¢°¢ »ó½ÂÇÏ¿© 61.28%, 53.19%, 56.94%ÀÇ ¼º´ÉÀ» µµÃâÇÏ¿´´Ù. ±×¸®°í ¼Óµµ °³¼±À» À§ÇØ ¹ßÈ­¸¦ Ŭ·¯½ºÅ͸µÇÏ¿© ¼Óµµ ¸é¿¡¼­µµ 103¹è Çâ»óµÈ äÆà ¸»¹¶Ä¡ ¹ÝÀÚµ¿ ±¸Ãà ¸ðµ¨À» ±¸ÃàÇÒ ¼ö ÀÖ¾ú´Ù.
¿µ¹®³»¿ë
(English Abstract)
In this paper, we proposed a semi-automatic expansion method to expand a chatting corpus using a large amount of utterance data from movie subtitles and drama scripts. To expand the chatting corpus, the proposed system used previously constructed chatting corpus and a similarity measure. If the similarity is calculated between a previously constructed chatting corpus and the input utterance was greater than a threshold value set in the experiment, the input utterance was selected as a new chatting utterance, that it is a correct chatting pair. We used morpheme-unit word embeddings and a Convolutional Neural Networks to efficiently calculate the similarity of the utterance embedding. In order to improve the speed of the semi-automatic expansion process, we proposed to reduce the amount of computation by clustering chat corpus by K-means clustering algorithm. Experimental results showed that the precision, recall, and F1 score of the proposed system were 61.28%, 53.19%, and 56.94%, respectively, which was 5.16%p, 6.09%, and 5.73%p higher than that of the baseline system. The term frequency and the speed of our system were also about a hundred times faster.
Å°¿öµå(Keyword) äÆà ½Ã½ºÅÛ   ¹ÝÀÚµ¿ È®Àå   À¯»çµµ   ÇÕ¼º°ö ½Å°æ¸Á   ¹ßÈ­ ´ÜÀ§ Ç¥»ó   chatting system   semi-automatic expansion   similarity   convolutional neural networks   utterance embedding  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå