µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)
Current Result Document : 17 / 17
ÇѱÛÁ¦¸ñ(Korean Title) |
ÀÓº£µù ±³Ã¼¿¡ µû¸¥ ±¸¾îü ÅؽºÆ® ŽÁö ¸ðµ¨ ¼º´É ºñ±³ |
¿µ¹®Á¦¸ñ(English Title) |
Performance Comparison of the Spoken Language Detection Model with Embedding Replacement |
ÀúÀÚ(Author) |
±èÇöÁ¾
Hyeonjong Kim
³²±ÃÁÖÈ«
¹®¾ç¼¼
Yang-Sae Moon
ÃÖÇüÁø
Hyung-Jin Choi
Juhong Namgung
±æ¸í¼±
Myeong-Seon Gil
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 36 NO. 02 PP. 0045 ~ 0055 (2020. 08) |
Çѱ۳»¿ë (Korean Abstract) |
µö·¯´× ±â¹Ý ¿å¼³ ŽÁö ¸ðµ¨Àº ±¸¾îüÀÇ ¿ÀÅ»ÀÚ ¹× ¶ç¾î¾²±â ¿À·ù·Î ÀÎÇØ Á¤È®µµ Çâ»ó¿¡ ¸¹Àº Á¦¾àÀÌ ÀÖ´Ù. ƯÈ÷, ±¸¾îü´Â ÇнÀ µ¥ÀÌÅÍ »ý¼ºÀ» À§ÇÑ ÇüÅÂ¼Ò ºÐ¼®¿¡¼ ´Ü¾î ÀÇ¹Ì ÆľÇÀ» ¹æÇØÇÏ´Â ÇüżҰ¡ ºó¹øÇÏ°Ô »ý¼ºµÇ´Â ¹®Á¦Á¡ÀÌ ÀÖÀ¸¸ç, À̴ ŽÁö ¸ðµ¨ÀÇ Á¤È®µµ¸¦ ¶³¾î¶ß¸®´Â °¡Àå Å« ¿äÀÎÀÌ´Ù. º» ³í¹®¿¡¼´Â ÀÌ·¯ÇÑ Çѱ¹¾î ±¸¾îüÀÇ ¹®Á¦Á¡À» ±Øº¹Çϱâ À§ÇØ, ÀÓº£µù¿¡ µû¸¥ ŽÁö ¸ðµ¨À» ¼³°è ¹× ±¸ÇöÇÏ°í, À̸¦ ±â¹ÝÀ¸·Î ¿å¼³ ŽÁö Á¤È®µµ¸¦ ºñ±³ÇÑ´Ù. ŽÁö¿¡´Â Word2Vec, fastText, SKT-KoBERT, KoELECTRAÀÇ ÃÑ ³× °¡Áö ÀÓº£µù ¸ðµ¨À» »ç¿ëÇϸç, ½ÇÇèÀ» ÅëÇØ °¢ ÀÓº£µù ±â¹Ý ¿å¼³ ŽÁö ¸ðµ¨ ¼º´ÉÀ» ºñ±³ ¹× Æò°¡ÇÑ´Ù. ½ÇÇè °á°ú, »ç¿ë ¹®ÀÚ ´ÜÀ§¿¡ µû¸¥ ½ÇÇèÀº Word2Vec°ú fastText ¸ðµÎ 90% ÀÌ»óÀÇ Á¤È®µµ¸¦ º¸¿´°í, ÁßÀǼº ÆÇ´Ü ¿©ºÎ¿¡ µû¸¥ ½ÇÇè¿¡¼´Â SKT-KoBERT°¡ fastText¿¡ ºñÇØ ¿ùµîÈ÷ ³ôÀº ¼º´ÉÀ» º¸ÀÌ´Â °ÍÀ¸·Î ³ªÅ¸³µ´Ù. ¸¶Áö¸·À¸·Î, »çÀü ÇнÀ ¹æ¹ý¿¡ µû¸¥ ½ÇÇè ¶ÇÇÑ SKT-KoBERT°¡ KoELECTRA¿¡ ºñÇØ ³ôÀº ¼º´ÉÀ» º¸ÀÌ´Â °ÍÀ¸·Î ³ªÅ¸³µ´Ù. º» ³í¹®ÀÇ ½ÇÇè °á°ú¸¦ ÅëÇØ, ´Ù¾çÇÑ ±¸¾îü ±â¹Ý µö·¯´× ¼ºñ½º¿¡ º¸´Ù È¿°úÀûÀÎ ÀÓº£µù ±â¼úÀ» Àû¿ëÇÒ ¼ö ÀÖÀ» °ÍÀ¸·Î »ç·áµÈ´Ù. |
¿µ¹®³»¿ë (English Abstract) |
Deep learning-based abuse detection model is limited in accuracy due to frequent typos and spacing errors in Korean text. Particularly, in the process of morphological analysis of spoken language for generating learning data, there is a problem in morphemes that make it difficult to grasp the meaning of words are frequently extracted. This is the biggest cause of degrading the accuracy of the abuse detection model. In this paper, to overcome the problem of Korean spoken language, we design and implement a detection model based on embedding, and compare the accuracy of abuse detection We use four embedding models: Word2Vec, fastText, SKT-KoBERT, and KoELECTRA for detection, and we compare and evaluate the performance of each embedding-based abuse detection model through experiments. As a result of the experiment, the character unit-based experiments showed more than 90% accuracy in both Word2Vec and fastText, and in the experiment according to the determination of ambiguity, SKT-KoBERT showed significantly higher performance than fastText. Finally, the experiment according to the pre-learning method also showed higher performance of SKT-KoBERT than KoELECTRA. Through the experimental results of this paper, it is considered that more effective embedding technology can be applied to various spoken language-based deep learning services. |
Å°¿öµå(Keyword) |
±¸¾îü ÅؽºÆ®
ÅؽºÆ® ±â¹Ý µö·¯´× ¸ðµ¨
ÅؽºÆ® ÀÓº£µù
¿å¼³ ŽÁö
Spoken language
Text-based deep learning model
Text embedding
Abuse detection
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|